Closed estaub closed 7 years ago
Null legislator IDs indicate that we haven't been able to reconcile a vote's text name with legislators yet. As you can see, some of these indicate bugs (and can be filed as such, see ND, MD, LA) and some are simply names that we haven't yet been able to match due to formatting differences (see SC, ME).
Appreciate you trying to bring these to the forefront, if you could give me a list of the other checkers you intend to write so I can let you know if they're duplicative of our existing data quality tools.
@jamesturk Sorry, I don't have a list. I'm not planning this ahead, just writing validators in the process of writing other code using the API. I wrote one sandbox version a few months ago, using MongoDB, and had so many discovered-too-late data integrity problems that I restarted, using Postgres, with one schema for raw API input and another for cleaned data. So I need to write these checkers regardless to know what I can trust where, and to patch things up locally where appropriate. I'll be sharing this stuff (Typescript on Node) on Github when it's a bit more well-done.
The referencing pull request tackles a few basic data quality issues with vote name scraping. There are some others that can likely be tackled with one or two line fixes to state scraper:
This list skips any state where the vote name might require a manual update to match, or that look like they might require a change to the billy NameMatcher
.
Added issues for all the one-off fixes.
The following named legislators have no leg_id in at least some votes in the current term:
This is a superset of https://github.com/openstates/openstates/issues/1360 .
In case you're wondering... I intend to continue to drop off first copies of reports on various data issues as issues here. Going forward, I hope to set up an online monitor on issues that continue to show up.