lmullen / legal-modernism

Law and legal practice modernized in the nineteenth-century United States. We are studying and visualizing the history of the modernization of American law.
https://legalmodernism.org
MIT License
4 stars 0 forks source link

Create a test suite to check accuracy of citation detection #34

Closed lmullen closed 2 years ago

lmullen commented 2 years ago

This would be a spreadsheet with three (or four) columns:

  1. The text of the citation as you expect to find it in the wild. This may or may not include surrounding text, depending on the test.
  2. The cleaned, normalized text of the citation.
  3. Optionally, the citation as it would be found in CAP.
  4. A brief description of what is being tested in that case: (e.g., Citation with pincite, Citation to antique reporter etc.)
lmullen commented 2 years ago

We might also eventually add a column for the URL of the case in CAP.

lmullen commented 2 years ago

From @kfunk074, the start of a test suite.

citation detection tests.xlsx

lmullen commented 2 years ago

Just a few of the most straightforward examples that should be added to this:

kfunk074 commented 2 years ago

Just a few of the most straightforward examples that should be added to this:

  • [ ] Kelly or other antique reporters
  • [ ] Oreg. vs Or. and other variations in abbreviation
  • [ ] British citations

We already know eyecite misses these as currently constituted. The test suite I've written is to check what kinds of OCR errors eyecite struggles with, if any.

lmullen commented 2 years ago

Right. But we will eventually want to test for every kind of thing to see whether we are finding citations.

lmullen commented 2 years ago

This is basically done, in the sense that we can add citations that have caused us problems and know that they are fixed. And we are also checking this manually. So this is as done as it is going to be.