Open lmullen opened 2 years ago
There is no open source library for statutes that I’m aware of. The proprietary HeinOnline might be open to collaboration after we can prove concept with CAP.
For now if we just want to compile a database of citations, a couple notes:
I think this is going to be sufficiently complicated that I don't want to get bogged down here until we've done the CAP cases.
The approach, I think, would be to write a series of more targeted detectors, one each for the patterns. So I don't want to introduce more noise in the generic detector by modifying it.
I am guess OCRwise that 1848 N.Y. Laws § 1
is just unlikely to appear that often because of §
. But 1848 N.Y. Laws 497 Sec. 1
is just number words number words number
which will catch a lot of stuff.
As the world's premier expert on OCR detection of section symbols, I'd say it does better than you might expect. The trick is usually making sure the regex finder isn't excluding weird characters or treating them as a stop term.
I agree on tabling statutes for now but keeping the issue live to target down the line.
We have likely found statutory citations in the format
1 Reporter 123
. (Are there other forms of statutory citation?) We need a source of data and way to reconcile the statutory citations, parallel to reconciling CAP.