sc3 / cook-convictions-data

Django project for loading, cleaning and querying data about criminal convictions in Cook County, Illinois
4 stars 4 forks source link

What should be done if, given an ILCS, there are multiple matching IUCR codes? #4

Closed bepetersn closed 10 years ago

bepetersn commented 10 years ago

It seems as though our iucr package's functionality currently does nothing if, when trying to associate an IUCR code with an ILCS statute reference, it finds more than one code. More specfically, the iucr package raises an exception, which statute.py of this data project responds to by setting a disposition's iucr_code field to the empty string. We are currently losing about 30% of our IUCR data just to this, in absolute terms.

However, it's really a little bit worse than just 30%. Some statutes are affected disproportionately by this. I am planning on posting a JSON document with all of the statutes for which this happens, along with counts for each. Consider 720-5/19-1(a), though. Burglary. There are around 15000 dispositions for which there is no IUCR code because of this issue. This translates into about half as many convictions with no IUCR code.

Here are some of the other statutes disproportionately affected by this issue:

In my opinion, there isn't an obvious solution to this problem. The shape of the data varies among statutes, but typically there is at least SOME relationship between the multiple IUCR codes associated with a single statute. So from one perspective, it might not matter that much. The simplest thing I can think to do is to return the first IUCR code associated with a statute. It might be possible to make this slightly more dynamic in the cases where there might be value in doing so. For instance, choosing the most "severe" IUCR code.

Thoughts, @ghing?

bepetersn commented 10 years ago

This is related to #3.

ghing commented 10 years ago

@bepetersn Good catch.

More specfically, the iucr package raises an exception, which statute.py of this data project responds to by setting a disposition's iucr_code field to the empty string.

Is this precisely what happens? The iucr.lookup_by_ilcs() should return a list of Offense objects when an ILCS code maps to more than one offense. See https://github.com/sc3/python-iucr/blob/master/iucr/__init__.py#L108 through https://github.com/sc3/python-iucr/blob/master/iucr/__init__.py#L110.

In any case, the important observation is that we currently don't try to disambiguate between the multiple IUCR codes and just set it to an empty string.

Here's a few thoughts off the top of my heads:

bepetersn commented 10 years ago

4 is part of an answer to this. The rest of it is that we ultimately don't care about getting IUCRs for every single statute, especially if it's not due to our incompetence, but because of the way statutes and IUCR codes get assigned.

Between using charge descriptions, and @ghing's work to roll up IUCR codes to our categories of interest, we will handle multiple IUCR codes for a statute.