unitedstates / BillMap

Utilities and applications for the FlatGov project by Demand Progress
Other
14 stars 2 forks source link

Add drop down filters for Calendar on Home page #78

Closed smplater closed 3 years ago

smplater commented 3 years ago

Image 2020-12-15 at 10 35 22 AM

There should be a filter for "Chamber" "Committee" and "Event Type"

leedavidr commented 3 years ago

@smplater Could I get examples of Chamber, Committee, and Event Type?

Guessing event type could be -House of Representatives -Senate -OPM These are the only datasets I was able to download so far Do we have more datasets? There's a few event attributes we may be able to use but they seem to be optional and not always present. such as measure & business

Committee and Chamber I'm not sure how I can classify events by committees with current data set So this one we may not be able to support?

Chamber, is this House of Representatives vs. Senate? or wasn't sure what this was

I'm trying to normalize 3 different event datasets into our DB There's not too much info we can use besides pure event data Additional processing may need to be done or we may need more datasets

DanielSchuman commented 3 years ago

Ari,

I may be able to help.

Senate committee proceedings are all available at:

Because they come from this source, you can automatically categorize all the data as Senate committee proceedings.

House committee proceedings are all available at

<a class="downloadXML" href="Download.aspx?file=/billsthisweek/20210517/20210517.xml" xmlns:dt="http://xsltsl.org/date-time"> XML

See specifically Bills This Week

An XML file for each week is available for the “Bills to be considered on the House Floor” section of docs.house.gov. This XML is well-formed. The elements and attributes are self-describing. Committee Repository

An XML file for each meeting is available in the “committee repository” section of docs.house.gov. The XML is well-formed. The elements and attributes are self-describing.

The element describes the status of the meeting and it can contain one of the following values:

Data that comes from this source can be automatically categorized as House proceedings, either committees or floor actions, depending on where you get it from

IF, HOWEVER, the data is being pulled from the Congress.gov committee schedule

The website that people can read is https://www.congress.gov/committee-schedule/weekly/2021/05/17?searchResultViewType=expanded There is no obvious way to download that contact as structured data as far as I know. But I have not played with it.

On Wed, May 19, 2021 at 1:09 PM David R. Lee @.***> wrote:

@smplater https://github.com/smplater Could I get examples of Chamber, Committee, and Event Type?

Guessing event type could be -House of Representatives -Senate -OPM These are the only datasets I was able to download so far Do we have more datasets? There's a few event attributes we may be able to use but they seem to be optional and not always present. such as measure & business

Committee and Chamber I'm not sure how I can classify events by committees with current data set So this one we may not be able to support?

Chamber, is this House of Representatives vs. Senate? or wasn't sure what this was

I'm trying to normalize 3 different event datasets into our DB There's not too much info we can use besides pure event data Additional processing may need to be done or we may need more datasets

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/aih/FlatGov/issues/78#issuecomment-844300311, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWRVUBDP7DYEWYOCGIJJSTTOPWDPANCNFSM4U4TT3GQ .

leedavidr commented 3 years ago

Thank you Daniel. I was looking at this.

1. For the senate hearings, I think we are good because I could get the data from the mentioned URL: https://www.senate.gov/general/committee_schedules/hearings.xml

2. I think they're all hearings though, I wasn't sure where I could find senate committee markups yet

3. Regarding the house data, there's no easy resource to consume calendar data, it seems. The data is on ASP/HTML pages, but HTML is not necessarily well-defined, so we could have a lot of parsing issues. Fortunately, it looks like calendar data is not available for June, so maybe we only have to scrape a little bit of data at a time.

4. As a workaround, I thought about consuming the latest (past) committee meetings (XML), and parse the future committee meetings from the website. Today, the latest (past) committee meetings are not showing up, which is odd because it was listing previous meetings and not just today's. I'm thinking of scraping the monthly view, but this gets a bit hairy so I'll work on some of the other issues first

DanielSchuman commented 3 years ago

Thank you, David. I might be able to clear up a few more things.

  1. Hearings and markups will be published in the same location. If you see language like "hearing" or "markup" or "business meeting," they are all committee proceedings. This is true in the House and in the Senate.

There is nothing in the XML to distinguish between hearings and business meetings, so just treat them all as committee proceedings. So, looking at the Senate XML page that you linked to above...

This is a hearing

SSBK00 Banking, Housing, and Urban Affairs 25-MAY-2021 10:00 AM Tuesday WEBEX *Hearings to examine the semiannual testimony on the Federal Reserve's supervision and regulation of the financial system.*

this is a markup, aka a business meeting

SSHR00 Health, Education, Labor, and Pensions 25-MAY-2021 10:00 AM Tuesday SH-216 *Business meeting to consider S.1675, to improve maternal health, S.1491, to amend the Public Health Service Act to improve obstetric care in rural areas, S.1662, to increase funding for the Reagan-Udall Foundation for the Food and Drug Administration and for the Foundation for the National Institutes of Health, S.1301, to provide for the publication by the Secretary of Health and Human Services of physical activity recommendations for Americans, S.610, to address behavioral health and well-being among health care professionals, S.1658, to amend the Fair Labor Standards Act of 1938 to expand access to breastfeeding accommodations in the workplace, and other pending calendar business.*

This is another business meeting, but instead of looking at legislation, it looks at a nomination.

SSJU00 Judiciary 26-MAY-2021 10:00 AM Wednesday SD-G50 *Hearings to examine pending nominations*.
  1. On the House calendar data

It's important to keep in mind that this needs to be constantly checked, at least once a day (maybe twice). Notice of committee hearings is supposed to be given 7 days in advance and notice of committee hearings is supposed to be given 3 days in advance, but this is not always followed. For example, you probably will find very little calendar data for June.

There is XML for the pages, but the info is at the page level.

Here is a particular calendar item, a markup of H.R. 1629

This is the landing page: https://docs.house.gov/Committee/Calendar/ByEvent.aspx?EventID=112659

Notice the URL increments in the event ID, which suggests one way to get each item is to increment the ID.

Regardless, at the landing page, there is a button for download the meeting ID.

It creates a local temp file. Here's mine: file:///C:/Users/DANIEL~1/AppData/Local/Temp/HMTG-117-RU00-20210517.xml

The time file appears to be well-formed XML. It contains, among other things:

Note that meeting information can be updated and meetings can be postponed.

I will ask to see if there's XML for each calendar week that should make discovering this information easier. We know that it's parsable and usable because this is how, we think, that Congress.gov gets the House meeting information.

Also, Josh Tauberer had built a tool that pulls down all the House and Senate proceedings and put them into an agenda. We can ask him for his code, which should be on github somewhere.

3.

I don't understand this: "As a workaround, I thought about consuming the latest (past) committee meetings (XML), and parse the future committee meetings from the website. Today, the latest (past) committee meetings are not showing up, which is odd because it was listing previous meetings and not just today's. I'm thinking of scraping the monthly view, but this gets a bit hairy so I'll work on some of the other issues first"

Can you explain more about what you are having trouble finding?

On Fri, May 21, 2021 at 4:20 PM David R. Lee @.***> wrote:

Thank you Daniel. I was looking at this.

1.

For the senate hearings, I think we are good because I could get the data from the mentioned URL: https://www.senate.gov/general/committee_schedules/hearings.xml

1.

I think they're all hearings though, I wasn't sure where I could find senate committee markups yet

1.

Regarding the house data, there's no easy resource to consume calendar data, it seems. The data is on ASP/HTML pages, but HTML is not necessarily well-defined, so we could have a lot of parsing issues. Fortunately, it looks like calendar data is not available for June, so maybe we only have to scrape a little bit of data at a time.

1.

As a workaround, I thought about consuming the latest (past) committee meetings (XML), and parse the future committee meetings from the website. Today, the latest (past) committee meetings are not showing up, which is odd because it was listing previous meetings and not just today's. I'm thinking of scraping the monthly view, but this gets a bit hairy so I'll work on some of the other issues first

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/aih/FlatGov/issues/78#issuecomment-846231019, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAWRVUELY6ZO5NI3KHGTAI3TO257VANCNFSM4U4TT3GQ .

DanielSchuman commented 3 years ago

I don't know how to pull in @joshdata (Josh Tauberer) to this Github thread, but if he has a parsing tool for the House calendar that is publicly-available hopefully he can point us to it, because he had gotten this working.

JoshData commented 3 years ago

Hi all. Yes of course there is already a scraper.

Documentation: https://github.com/unitedstates/congress/wiki/Committee-Meetings Script: https://github.com/unitedstates/congress/blob/master/tasks/committee_meetings.py

leedavidr commented 3 years ago

Thank you Daniel, Josh! I was able to use the existing parser to get the committee schedules It looks like there was XML API for each committee

The existing parser gets a lot more details (documents, witnesses, etc.) so I removed some of that logic for ours to reduce some complexity There's some bad data still in the XML API, so I further reduced the scope to only look at data in the past 10 days vs 60 days

aih commented 3 years ago

Implemented image