sc3 / cookcountyjail

A Django app that tracks the population of Cook County Jail over time and summarizes trends.
http://cookcountyjail.recoveredfactory.net/api/1.0/?format=json
Other
31 stars 23 forks source link

if a new charge gets found that has the same but less information, ignore it? #325

Open bepetersn opened 10 years ago

bepetersn commented 10 years ago

For instance, consider this charges history. It's almost silly that multiple charges are listed. The earliest one is the same as the third and last one. The only reason this gets listed twice is because with the second charge, the "charges" field got taken down, which is the optional commentary. But for all three of them, the "charges_citation" field is exactly the same.

screenshot from 2014-04-07 13 24 50

It would be possible to have it so that if the same but less information is scraped from the Sheriff's site, we ignore the charge with less information. Or even persist the missing information to the second charge, and keep the record that it was tampered with? I don't know.

@nwinklareth? @fgregg?

bepetersn commented 10 years ago

I think this happens a lot. Here's another example, with the same charge being added, emptied, and re-added over and over:

screenshot from 2014-04-07 15 31 36

bepetersn commented 10 years ago

or @eads or @wilbertom ... need feedback on this eventually!