sc3 / python-iucr

Python package for working with Illinois Uniform Crime Reporting (IUCR) data.
3 stars 4 forks source link

Provide resource for parsing ILCS strings into their component parts #8

Open bepetersn opened 10 years ago

bepetersn commented 10 years ago

We've talked about switching to only allowing lookup of IUCR code with ILCS bits. If we were to make this change, it would put the responsibility on the client to do their own parsing of ILCS reference strings, parsing which we could at least try to help with.

The simplest thing would be to provide a regex with which to parse ILCS reference strings as a constant, like ILCS_FORMAT. Another possibility would be to expose a method which is capable of doing some parsing for the client.

Problems arise in that there might be multiple formats in which ILCS data could appear, which suggests a more involved approach as opposed to providing a single regex.

ghing commented 10 years ago

@bepetersn Good summary of all the caveats connected with this issue.

ghing commented 7 years ago

@hancush Can you check if parserator is appropriate for parsing ILCS statutes? Do you have to train it? How much training data does it need?

ghing commented 7 years ago

Also, I think https://github.com/sc3/python-ilcs might be a better place for this.