usgpo / api

services to access govinfo content and metadata
https://api.govinfo.gov
Other
184 stars 58 forks source link

Jamie Raskin not appearing in CREC impeachment data #80

Closed asebold closed 3 years ago

asebold commented 3 years ago

Mr. Raskin is the lead manager for the current impeachment hearing, but I noticed he isn't listed as a speaker (or anywhere) in CREC metadata concerning the trial. Wondering if this was an oversight or intended. I use this metadata to keep track of who spoke in Congress each day, so it would be really helpful to have him listed as a speaker in the granule summary or mods.

Examples below. You can see everyone else is listed as a speaker but Mr. Raskin is no where to be found. https://api.govinfo.gov/packages/CREC-2021-02-10/granules/CREC-2021-02-10-pt1-PgS615-4/summary?api_key=DEMO_KEY https://api.govinfo.gov/packages/CREC-2021-02-10/granules/CREC-2021-02-10-pt1-PgS615-4/mods?api_key=DEMO_KEY

jonquandt commented 3 years ago

@asebold -thanks for bringing this to our attention. I believe it is due to how the text file is constructed. If I recall correctly, we parse looking for the honorific and last name of Members at the beginning of a speaking section/paragraph. It appears that in this instance, the text has:

Mr. Manager RASKIN

where normally, we would see something like: Mr. RASKIN

I will go ahead and make some metadata updates to handle this for now, and will look to see what we can do on the parsing side.

Thanks

asebold commented 3 years ago

Awesome. Thank you!

jonquandt commented 3 years ago

@asebold - I have made the necessary updates for CREC-2021-02-09, CREC-2021-02-10, CREC-2021-02-11, and CREC-2021-02-13. Please review and let me know if you see any missing members.

We will be making a parsing update to handle this in the future, which will allow us to reprocess and incorporate Members for the 2020 Trump impeachment as well as the Clinton impeachment.

asebold commented 3 years ago

Looks good. Appreciate it!

asebold commented 3 years ago

Just noticed something else. Rep. Madeleine Dean is missing her name in the speaker list (for the granule summary posted above). It has her bioguide ID and all the other info though. Excerpt below, she's the first one.

{
"bioGuideId": "D000631",
"gpoId": [],
"chamber": "H",
"party": "D",
"role": "SPEAKING",
"state": "PA",
"congress": "117",
"authorityId": "2432",
"houseRefId": [],
"chamberIdCode": []
},
{
"memberName": "Lieu, Ted",
"bioGuideId": "L000582",
"gpoId": [],
"chamber": "H",
"party": "D",
"role": "SPEAKING",
"state": "CA",
"congress": "117",
"authorityId": "2230",
"houseRefId": [],
"chamberIdCode": []
},
jonquandt commented 3 years ago

@asebold -- this is fixed:

 {
      "memberName": "Dean, Madeleine",
      "bioGuideId": "D000631",
      "gpoId": [],
      "chamber": "H",
      "party": "D",
      "role": "SPEAKING",
      "state": "PA",
      "congress": "117",
      "authorityId": "2432",
      "houseRefId": [],
      "chamberIdCode": []
    },

The mods file for the package and granule also includes this information with some additional formats:

<congMember authorityId="2432" bioGuideId="D000631" chamber="H" chamberIdCode="" congress="117" gpoId="" houseRefId="" state="PA" role="SPEAKING" party="D">
<name type="parsed">Ms. Manager DEAN</name>
<name type="authority-fnf">Madeleine Dean</name>
<name type="authority-lnf">Dean, Madeleine</name>
</congMember>