Closed bepetersn closed 10 years ago
This is related to #3.
@bepetersn Good catch.
More specfically, the iucr package raises an exception, which statute.py of this data project responds to by setting a disposition's iucr_code field to the empty string.
Is this precisely what happens? The iucr.lookup_by_ilcs()
should return a list of Offense
objects when an ILCS code maps to more than one offense. See https://github.com/sc3/python-iucr/blob/master/iucr/__init__.py#L108 through https://github.com/sc3/python-iucr/blob/master/iucr/__init__.py#L110.
In any case, the important observation is that we currently don't try to disambiguate between the multiple IUCR codes and just set it to an empty string.
Here's a few thoughts off the top of my heads:
720-5/19-1(a)
it spans Burglary and Burglary or Theft From Motor Vehicle , but I don't think that's a huge deal. If the category is at least the same, we can leave the iucr_code field empty and instead just set the iucr_category field. For most of the questions we've seen so far, I think this gives us enough info to construct our queries.Between using charge descriptions, and @ghing's work to roll up IUCR codes to our categories of interest, we will handle multiple IUCR codes for a statute.
It seems as though our
iucr
package's functionality currently does nothing if, when trying to associate an IUCR code with an ILCS statute reference, it finds more than one code. More specfically, theiucr
package raises an exception, whichstatute.py
of this data project responds to by setting a disposition'siucr_code
field to the empty string. We are currently losing about 30% of our IUCR data just to this, in absolute terms.However, it's really a little bit worse than just 30%. Some statutes are affected disproportionately by this. I am planning on posting a JSON document with all of the statutes for which this happens, along with counts for each. Consider
720-5/19-1(a)
, though. Burglary. There are around 15000 dispositions for which there is no IUCR code because of this issue. This translates into about half as many convictions with no IUCR code.Here are some of the other statutes disproportionately affected by this issue:
In my opinion, there isn't an obvious solution to this problem. The shape of the data varies among statutes, but typically there is at least SOME relationship between the multiple IUCR codes associated with a single statute. So from one perspective, it might not matter that much. The simplest thing I can think to do is to return the first IUCR code associated with a statute. It might be possible to make this slightly more dynamic in the cases where there might be value in doing so. For instance, choosing the most "severe" IUCR code.
Thoughts, @ghing?