Closed lmullen closed 2 years ago
This probably should be done in only a minimal way in the citation detector itself. This is a problem for OCR correction on the one end, or citation reconciliation on the other end, where the data about corrections can be used in both Go and Python and where there is more room for human intervention to correct obvious mistakes that are not obviously spelled out in regex.
This is done now. The remainder will happen at the cleanup stage.
A common problem is that the generic regex for reporters finds reporter names which need to be cleaned up.
Example tests for checking this cleaning are here: https://github.com/lmullen/legal-modernism/blob/issue44-Refactor-the-code-base-to-allow-for-tests/modularity/go/citations/citation_test.go#L18