lmullen / legal-modernism

Law and legal practice modernized in the nineteenth-century United States. We are studying and visualizing the history of the modernization of American law.
https://legalmodernism.org
MIT License
4 stars 0 forks source link

Whitelist the top reporters #81

Closed lmullen closed 1 year ago

lmullen commented 2 years ago

The process for doing this is now well established. We just need to do it until we decide it's done.

kfunk074 commented 2 years ago

Additional whitelist, taking care of another half million cites: reporters_citation_to_cap_whitelist-6.csv

And also of value, Sean has caught a few mistakes in the whitelist (almost all affecting uk reporters and statutes, so nothing that would have derailed us, still...): Old Whitelist Errors.docx

kfunk074 commented 2 years ago

Newest whitelist additions:

reporters_citation_to_cap_whitelist-7.csv

kfunk074 commented 1 year ago

New year, new whitelist:

reporters_citation_to_cap_whitelist-8.csv

lmullen commented 1 year ago

I'm on it.

How much more whitelisting do you anticipate doing?

kfunk074 commented 1 year ago

Sean said he only had about 100 entries left on the sheet we sent him. We are definitely at a point of diminishing returns, so I told him to finish those off and then we'll see what other uses we can put him to in the spring.

lmullen commented 1 year ago

That sounds like a good plan.

kfunk074 commented 1 year ago

Final whitelist update from Sean. He also has some more corrections to the existing whitelist, but instead of giving you the tedious list, how about you send me the complete merged whitelist csv and I'll have Sean enter the corrections directly and also make sure the entries are correct for the volume-shifting problem.

reporters_citation_to_cap_whitelist-9.csv

lmullen commented 1 year ago

I have added all the white lists. The current version is below.

reporters_citation_to_cap.csv

Thanks to Sean for a lot of hard work. I'm closing this issue but you can reopen it if something comes up.

kfunk074 commented 1 year ago

Here is Sean's final corrected whitelist. He has made sure our standardization is consistent.

Where the volume number is the same for official and nominate reporters, Sean made sure our standardization (Kelly) is correctly associated with CAP’s (Ga.). Where CAP recognizes the parallel cite in its API, he’s made sure our standardizations simply match (Bail. and Bail. for instance). Where CAP does have the case but does not recognize parallel cites in its API and the volume number differs (N.J. Eq. cases, for instance), Sean has put DIFFERENT in the cap_reporter column. It looks like @lmullen already has a process for linking those cases, so can probably ignore, but the marker is there if needed.

reporters_citation_to_cap.csv

lmullen commented 1 year ago

I think I understand the reasoning behind including DIFFERENT. But that definitely cannot go in that column. It's a different type of information. I will move it to a separate column, and we can perhaps use it for error checking.

lmullen commented 1 year ago

Uploaded the new whitelist. Thanks again to Sean. Double checking that everything is fine.

kfunk074 commented 1 year ago

The current whitelist should be entirely swapped out with this revised one.

Corrected Whitelist July 2023.csv

(Don't freak out. I can give you all the corrections step by step if you want, but I've been careful and checked for duplicates, etc. Basically, a lot of federal entries needed to be corrected to link up with CAP's parallel cites properly. And all the English Reports cites needed to be standardized to match what's in the new database. And RAs have extended the list by about a thousand entries especially to account for English Reports.)

lmullen commented 1 year ago

Updated with the most recent white list.