simonw / congress-legislators-datasette

Datasette instance with data from unitedstates/congress-legislators
https://congress-legislators.datasettes.com/
4 stars 0 forks source link

Include middle initial and suffix in `name` column for executives #3

Closed simonw closed 2 years ago

simonw commented 2 years ago

This example query to show individuals who were vice president AND president is returning incorrect results: https://congress-legislators.datasettes.com/legislators?sql=select%0D%0A++distinct+executives.name%0D%0Afrom%0D%0A++executive_terms%0D%0A++join+executives+on+executive_terms.executive_id+%3D+executives.id%0D%0Awhere%0D%0A++type+%3D+%27prez%27%0D%0A++and+name+in+%28%0D%0A++++select%0D%0A++++++executives.name%0D%0A++++from%0D%0A++++++executive_terms%0D%0A++++++join+executives+on+executive_terms.executive_id+%3D+executives.id%0D%0A++++where%0D%0A++++++type+%3D+%27viceprez%27%0D%0A++%29%0D%0A

Because George Bush and George Bush are not the same person!

simonw commented 2 years ago

Annoyingly we don't have a reliable ID for executives, because some of them never served in congress (including George W Bush) which means they don't have bioguide IDs: https://congress-legislators.datasettes.com/legislators/executives

simonw commented 2 years ago

Ouch, that's not going to work either... because Joe Biden's middlename is inconsistently listed as Robinette and R:

CleanShot 2022-02-26 at 21 22 01@2x

For the moment I'm going to fix this by normalizing middle names down to initials.

simonw commented 2 years ago

Middle initials won't work due to "William Rufus de Vane King".

I'm going to special case the logic for George W. Bush and George H. W. Bush.

simonw commented 2 years ago

Actually no, I won't fix this. I'll use it as a lesson in why you shouldn't do this kind of thing with names instead!

simonw commented 2 years ago

It gets even worse... there are two records for Joseph Biden in that table: https://congress-legislators.datasettes.com/legislators/executives?_sort=id&name__exact=Joseph+Biden

Even though that's the same person. The two George Bush records are here: https://congress-legislators.datasettes.com/legislators/executives?_sort=id&name__exact=George+Bush

I'm going to try and get this fixed in the source data.

simonw commented 2 years ago

Filed a PR for that here: