lmullen / legal-modernism

Law and legal practice modernized in the nineteenth-century United States. We are studying and visualizing the history of the modernization of American law.
https://legalmodernism.org
MIT License
4 stars 0 forks source link

Run eyecite against specific treatises to get a comparison #72

Open lmullen opened 2 years ago

kfunk074 commented 1 year ago

My RA believes she has successfully run eyecite across the MOML corpus. Deduplicating cites to the same case by the same treatise, she found around 4 million edges, about half what we found through our whitelisting method (I think our number is 8.2 million). I'll upload her table below.

kfunk074 commented 1 year ago

Additionally, she calculated that eyecite found nearly a million cases we have not matched so far. The table of cites is attached here and I'll start going through it to figure out why we missed these. On a cursory glance, I would guess most of these simply aren't on our whitelist, like "Wash. C. C." which appears in this table over 17,000 times.

kfunk074 commented 1 year ago

Tables are too big. Can be found here: https://drive.google.com/drive/folders/19l8aVcdVPbZjUqqymVNDOm8fshel1gJf?usp=sharing

lmullen commented 1 year ago

I think we can easily add more things to the white list if need be. I would be curious to know if there are more systematic problems, however.

What about the inverse question? Are there cases, if so how many, we found that eye cite did not?

But really, what this points to is just creating a union of eye cite cases plus whitelist cases, which will be better than either method individually. We don't really care how we get there, as long as the cases are known to be good.

kfunk074 commented 1 year ago

See first comment above. She thinks we found 4 million cites eyecite didn't. So far.