sc3 / cook-convictions-data

Django project for loading, cleaning and querying data about criminal convictions in Cook County, Illinois
4 stars 4 forks source link

Some IUCR codes and categories in the latest upload of the DB are out of sync with the crosswalk #14

Closed bepetersn closed 10 years ago

bepetersn commented 10 years ago

While going very carefully over the data in our database, I found that there is some mysterious data whose origin I can't understand. Not too much, but a little. For example, searching by the charge description HARASSMENT BY TELEPHONE...

cs = Conviction.objects.filter(final_chrgdesc='HARASSMENT BY TELEPHONE')
for c in cs:  
    print(c.case_number, c.final_statute, c.iucr_code, c.iucr_category)  

The output is:

2005CR2695101 720 135/1-(2)  
2007C22021901 720-135/1-1  
2007C66174901 720-135/1-1 3800 Interference With Public Officers  
2009CR0650001 720-135/1-1 3960 Intimidation  

Fine, right? Except that doing an equivalent search over the IUCR crosswalk yields totally different IUCR codes and categories.

from convictions_data.models import Conviction
from convictions_data.statute import get_iucr
cs = Conviction.objects.filter(final_chargdesc='HARASSMENT BY TELEPHONE')
for c in cs:
    try:
        offenses = get_iucr(c.final_statute)
    except Exception:
        continue
    for o in offenses:
        print o.code, o.offense_category

The output is:

2820 Disorderly Conduct  
2825 Disorderly Conduct  
2820 Disorderly Conduct  
2825 Disorderly Conduct  
2820 Disorderly Conduct  
2825 Disorderly Conduct  

This doesn't seem possible, at least to my current understanding of how we generated our IUCR categories and codes. My understanding is that the statute2iucr management command was originally run to generate codes and categories, which itself relies on just the same method I used, get_iucr(), and the crosswalk behind the scenes. However, as you can see, the crosswalk doesn't have these values.

The reason this is worth pointing out is that, at least by the crosswalk, it would seem we can map from the charge description HARASSMENT BY TELEPHONE to the IUCR category Disorderly Conduct reliably, which is part of the task of #6.

Perhaps the statute2iucr command simply needs to be run again. In the meantime, it's easy to just work around this by doing a double check in the way I did above, that get_iucr() also contains an IUCR category, given a statute.

ghing commented 10 years ago

@bepetersn, I took a quick look at your example and I suspect that your intuition is right, these records were probably created with a less good version of our ILCS parsing code. Rerunning statute2iucr should fix this. We would also have to recreate the convictions from the disposition records, but that's no big deal. Thanks for looking into this.