Open kfunk074 opened 3 years ago
Grajzl and Murrell's helpful guide to the English Reports, and how they constructed their database of pre-1765 case reports: http://www.econweb.umd.edu/~murrell/articles/AppendicesMachineCaselawJOIE.pdf
"The source of our data and the starting point for our corpus construction and processing was a digitized database of English Reports, obtained from Juta and Company (Pty) Ltd (English Reports (1260-1865), n.d.). The resultant database consists of 129,042 nominate reports of decisions rendered in the English courts of law between the early 13th century and the mid-19th century."
Here's a start on a database of UK law reports, adapted from the second English edition (1892) of Joseph Story's Commentaries on Equity. It's probably incomplete, but hopefully not very.
So with perfect OCR, we can at least use this dataset to match a citation to a UK reporter. To this point I haven't attempted to correct common OCR errors on English reports. To do that, it would be helpful to have the output of our general regex run on the Story volume I mention above.
Image of a typical page in the English Reports. The plain text is not expensive to acquire. This page makes clear there are two complications posed by the English reports that we won't usually encounter with American reports: 1) multiple cases can be reported on a single page, meaning citation "addresses" are not unique. 2) Many private reporters had such limited runs they only produced one volume and so there is no volume signifier in the standard citation form. Neither of these derail the main project. We will either miss citations to the obscure private reporters or we can write special particular regex's to find them.
So far as I can tell, there is no CAP equivalent for UK case reports. There are things we could do to create more meaningful connections in the data, but these should all be considered back burner to the main project.
@kfunk074 Two questions about the status of this one.
Any more (much more?) to be done to create as complete a list of English reporters as reasonable?
Any reason to think these won't be picked up by our general Go cite detector? In other words, the problem isn't detection by analysis?
I don’t know what I don’t know. I think it’s a pretty extensive list, and I don’t know where to look to find more, though there may well be more out there. Many are single-volume, but that’s the only hang up to finding them with a general regex search.
For future reference, this database might be helpful as a UK CAP alternative. Have yet to suss out how comprehensive it is: https://swarb.co.uk/its-what-we-do/
We have essentially detected the British citations, unless there is some reporters that fall out of the 1 Reporter 123
pattern. What we need is a process to reconcile them to useful information parallel to CAP.
Not sure how I missed this before. A complete database of the English Reports appears to be here: http://www.commonlii.org/uk/cases/EngR/
It appears there are hand-keyed parallel citations that could link to our detected cases and allow us to extract at least the dates of the decisions. I'll see if they can share their datafiles.
Behold, the English Reports. Turns out each case has one and only one parallel cite, so no extra table needed for that. The second table here matches up volume number to court jurisdiction. We have the full text too, just not in table form yet. Low priority to get full text I would think.
Edit: File too big. Download the csv here.
A few pointers, as I review Phil's data:
The complete, clean, final, and godly English Reports are here: https://drive.google.com/drive/folders/1QpwUQHIxzAJdeUG15CdNPioT5HBilyKY?usp=sharing
The csv file contains everything described above as well as the clean years, titles, and wordcounts from Peter Murrell's data. This is ready to integrate when you're ready to tackle the integration.
CAP has only U.S. cases and does not detect citations of English (or any foreign) reporters, nor would it help much if it did, as the case text and metadata will not be in CAP.