sc3 / cookcountyjail

A Django app that tracks the population of Cook County Jail over time and summarizes trends.
http://cookcountyjail.recoveredfactory.net/api/1.0/?format=json
Other
31 stars 23 forks source link

inmate in_jail field is continuously set incorrectly by scraper #308

Open bepetersn opened 10 years ago

bepetersn commented 10 years ago

Going over a more interesting subset of housing locations, 15-EMAW, for people AWOL from Electronic Monitoring, I found a good proportion of inmates have their in_jail field set apparently incorrectly compared to their housing history. Here's an example of that:

screenshot from 2014-03-28 06 43 09

nwinklareth commented 10 years ago

Are you arguing that either one of those should be true? If so which one and why?

On Sat, Mar 29, 2014 at 9:03 PM, Brian Everett Peterson < notifications@github.com> wrote:

Going over a more interesting subset of housing locations, 15-EMAW, for people AWOL from Electronic Monitoring, I found a good proportion of inmates have their in_jail field set apparently incorrectly compared to their housing history. Here's an example of that:

[image: screenshot from 2014-03-28 06 43 09]https://cloud.githubusercontent.com/assets/1389463/2560184/ece42e5a-b7ae-11e3-816f-321678bd017c.png

Reply to this email directly or view it on GitHubhttps://github.com/sc3/cookcountyjail/issues/308 .

Regards

Norbert

Norbert Winklareth

bepetersn commented 10 years ago

In the above example, in_jail on the inmate should be set to false if the person's housing location is false. But it is in fact set to true.

That's what I was referring to.

bepetersn commented 10 years ago

The inmate's in_jail field has to match their latest housing_location.

nwinklareth commented 10 years ago

The cause of this mismatch happened with the database restore that happened after the data 0024 migration. That data migration, which set the inmate, in_jail, field to the correct state, needed to be run again.

nwinklareth commented 10 years ago

Running the audit found the following: (cookcountyjail)ubuntu@ip-10-190-110-43:~/apps/cookcountyjail$ ./manage.py audit_db Starting database audit: 2014-03-30 14:45:26.197862 Number inmates checked: 87512 Number inmates with incorrect 'in_jail' values: 3352 Audit took 0:01:08.005206.

After the data migration to correct the in_jail values, running the audit gave these results: (cookcountyjail)ubuntu@ip-10-190-110-43:~/apps/cookcountyjail$ ./manage.py audit_db Starting database audit: 2014-03-30 15:18:23.279196 Number inmates checked: 87512 Audit took 0:01:08.788787.

The audit program only checks in_jail values and as we find others to be checked they should be added to the program.

bepetersn commented 10 years ago

That's excellent!! Nice work, Norbert. On Mar 30, 2014 3:23 PM, "nwinklareth" notifications@github.com wrote:

Running the audit found the following: (cookcountyjail)ubuntu@ip-10-190-110-43:~/apps/cookcountyjail$ ./manage.py audit_db Starting database audit: 2014-03-30 14:45:26.197862 Number inmates checked: 87512 Number inmates with incorrect 'in_jail' values: 3352 Audit took 0:01:08.005206.

After the data migration to correct the in_jail values, running the audit gave these results: (cookcountyjail)ubuntu@ip-10-190-110-43:~/apps/cookcountyjail$ ./manage.py audit_db Starting database audit: 2014-03-30 15:18:23.279196 Number inmates checked: 87512 Audit took 0:01:08.788787.

The audit program only checks in_jail values and as we find others to be checked they should be added to the program.

Reply to this email directly or view it on GitHubhttps://github.com/sc3/cookcountyjail/issues/308#issuecomment-39038188 .

bepetersn commented 10 years ago

I just ran this command again tonight, and found:

"Starting database audit: 2014-04-02 04:49:41.173672 Number inmates checked: 87878 Number inmates with incorrect 'in_jail' values: 52 Audit took 0:01:09.488053."

The in_jail field seems to be continually diverging! Or am I misinterpreting this?

bepetersn commented 10 years ago

Last night, I got the database back to a state with 0 incorrect in_jail values, by re-running the migration you made, @nwinklareth. I ran audit_db again today (one scrape later), and found:

Starting database audit: 2014-04-02 16:55:44.290510 Number inmates checked: 88034 Number inmates with incorrect 'in_jail' values: 24 Audit took 0:01:10.870714.

Assuming the auditing program is correct (I was looking over it, and I only had a few lines where I was scratching my head), it seems like the scraper isn't setting the in_jail values correctly.