Closed lmullen closed 1 year ago
Additional whitelist, taking care of another half million cites: reporters_citation_to_cap_whitelist-6.csv
And also of value, Sean has caught a few mistakes in the whitelist (almost all affecting uk reporters and statutes, so nothing that would have derailed us, still...): Old Whitelist Errors.docx
Newest whitelist additions:
I'm on it.
How much more whitelisting do you anticipate doing?
Sean said he only had about 100 entries left on the sheet we sent him. We are definitely at a point of diminishing returns, so I told him to finish those off and then we'll see what other uses we can put him to in the spring.
That sounds like a good plan.
Final whitelist update from Sean. He also has some more corrections to the existing whitelist, but instead of giving you the tedious list, how about you send me the complete merged whitelist csv and I'll have Sean enter the corrections directly and also make sure the entries are correct for the volume-shifting problem.
reporters_citation_to_cap_whitelist-9.csv
I have added all the white lists. The current version is below.
Thanks to Sean for a lot of hard work. I'm closing this issue but you can reopen it if something comes up.
Here is Sean's final corrected whitelist. He has made sure our standardization is consistent.
Where the volume number is the same for official and nominate reporters, Sean made sure our standardization (Kelly) is correctly associated with CAP’s (Ga.). Where CAP recognizes the parallel cite in its API, he’s made sure our standardizations simply match (Bail. and Bail. for instance). Where CAP does have the case but does not recognize parallel cites in its API and the volume number differs (N.J. Eq. cases, for instance), Sean has put DIFFERENT in the cap_reporter column. It looks like @lmullen already has a process for linking those cases, so can probably ignore, but the marker is there if needed.
I think I understand the reasoning behind including DIFFERENT
. But that definitely cannot go in that column. It's a different type of information. I will move it to a separate column, and we can perhaps use it for error checking.
Uploaded the new whitelist. Thanks again to Sean. Double checking that everything is fine.
The current whitelist should be entirely swapped out with this revised one.
Corrected Whitelist July 2023.csv
(Don't freak out. I can give you all the corrections step by step if you want, but I've been careful and checked for duplicates, etc. Basically, a lot of federal entries needed to be corrected to link up with CAP's parallel cites properly. And all the English Reports cites needed to be standardized to match what's in the new database. And RAs have extended the list by about a thousand entries especially to account for English Reports.)
Updated with the most recent white list.
The process for doing this is now well established. We just need to do it until we decide it's done.