lmullen / legal-modernism

Law and legal practice modernized in the nineteenth-century United States. We are studying and visualizing the history of the modernization of American law.
https://legalmodernism.org
MIT License
4 stars 0 forks source link

Reconcile citations to statutes #73

Open lmullen opened 2 years ago

lmullen commented 2 years ago

We have likely found statutory citations in the format 1 Reporter 123. (Are there other forms of statutory citation?) We need a source of data and way to reconcile the statutory citations, parallel to reconciling CAP.

kfunk074 commented 2 years ago

There is no open source library for statutes that I’m aware of. The proprietary HeinOnline might be open to collaboration after we can prove concept with CAP.

For now if we just want to compile a database of citations, a couple notes:

lmullen commented 2 years ago

I think this is going to be sufficiently complicated that I don't want to get bogged down here until we've done the CAP cases.

The approach, I think, would be to write a series of more targeted detectors, one each for the patterns. So I don't want to introduce more noise in the generic detector by modifying it.

I am guess OCRwise that 1848 N.Y. Laws § 1 is just unlikely to appear that often because of §. But 1848 N.Y. Laws 497 Sec. 1 is just number words number words number which will catch a lot of stuff.

kfunk074 commented 2 years ago

As the world's premier expert on OCR detection of section symbols, I'd say it does better than you might expect. The trick is usually making sure the regex finder isn't excluding weird characters or treating them as a stop term.

I agree on tabling statutes for now but keeping the issue live to target down the line.