sc3 / cookcountyjail

A Django app that tracks the population of Cook County Jail over time and summarizes trends.
http://cookcountyjail.recoveredfactory.net/api/1.0/?format=json
Other
31 stars 23 forks source link

Look into cross-correlating with other datasets #26

Open eads opened 11 years ago

eads commented 11 years ago

@bepetersn:

datasets to looks into: fugitive, prison, crime incidents, judges, geodata

fgregg commented 11 years ago

This http://www4.cookcountysheriff.org/default1.asp?

bepetersn commented 11 years ago

This is really interesting. I wish we had more people to collaborate on this, to help us get this or similar correlation between different data sets.

On Mon, Nov 4, 2013 at 10:49 AM, Forest Gregg notifications@github.comwrote:

This http://www4.cookcountysheriff.org/default1.asp?

— Reply to this email directly or view it on GitHubhttps://github.com/sc3/cookcountyjail/issues/26#issuecomment-27701070 .

Brian Peterson

nwinklareth commented 11 years ago

Hi Forest,

I agree with Brian, very cool and with the hash we can co-relate this with the their booking.

It would be pretty straight forward to scrape. I fetched a page using this command:

curl --data "zipcode=60006&zipbutton=Begin%20Search" http://www4.cookcountysheriff.org/locatesearchresults.asp

there is only one table on that page so extracting the contents would be easy and the following the link to the next page is also easy, scraping the details page is a little bit more challenging but not to onerous.

Certainly something to add to the list of things to pursue, in the future.

For the moment, getting out the 2.0 version of the website is our main focus. Having assistance to help us understand the inmate location codes - the 6429, assuming I remembered the number correctly, Brian identified this morning, would greatly benefit the project - although I suspect that the numeric codes at the beginning is the most important part for our general needs.

Once the bulk of the 2.0 system is up and running, integration with other data sources is next direction that makes the most sense. In meetings that I have attended with other groups, interest around the inmate interactions with the Courts is the one that gets the most requests, so perhaps that is the where we should start.

Thoughts?

Norbert

On Mon, Nov 4, 2013 at 4:44 PM, Brian Everett Peterson < notifications@github.com> wrote:

This is really interesting. I wish we had more people to collaborate on this, to help us get this or similar correlation between different data sets.

On Mon, Nov 4, 2013 at 10:49 AM, Forest Gregg notifications@github.comwrote:

This http://www4.cookcountysheriff.org/default1.asp?

— Reply to this email directly or view it on GitHub< https://github.com/sc3/cookcountyjail/issues/26#issuecomment-27701070> .

Brian Peterson

— Reply to this email directly or view it on GitHubhttps://github.com/sc3/cookcountyjail/issues/26#issuecomment-27730545 .

Regards

Norbert

Norbert Winklareth

fgregg commented 11 years ago

Hi guys,

This may not be the best place to discuss it, but the majority of location codes are very simple.

Let's take Division I. It says on this page http://www.cookcountysheriff.org/doc/doc_DivisionsOfJail.html, that Division I has eight 'blocks' and four floors.

Once you know that, and you look at location numbers, it seems pretty clear that location "01-A-1-18-2" is in Division I, in the "A" Block, on the 1st floor. I don't know what 18-2 means.

You can also use this page http://www.cookcountysheriff.com/doc/doc_division1.html to help get a sense of what the subdivisions are.

It would be really worthwhile try to do this parsing for all the locations we can, because then we could identify the large residual category and tag them in the API. There are about a dozen or so of this "other" category, and include locations like 02- C-TRANSFER, KAKEE

On Mon, Nov 4, 2013 at 7:36 PM, nwinklareth notifications@github.com wrote:

Hi Forest,

I agree with Brian, very cool and with the hash we can co-relate this with the their booking.

It would be pretty straight forward to scrape. I fetched a page using this command:

curl --data "zipcode=60006&zipbutton=Begin%20Search" http://www4.cookcountysheriff.org/locatesearchresults.asp

there is only one table on that page so extracting the contents would be easy and the following the link to the next page is also easy, scraping the details page is a little bit more challenging but not to onerous.

Certainly something to add to the list of things to pursue, in the future.

For the moment, getting out the 2.0 version of the website is our main focus. Having assistance to help us understand the inmate location codes - the 6429, assuming I remembered the number correctly, Brian identified this morning, would greatly benefit the project - although I suspect that the numeric codes at the beginning is the most important part for our general needs.

Once the bulk of the 2.0 system is up and running, integration with other data sources is next direction that makes the most sense. In meetings that I have attended with other groups, interest around the inmate interactions with the Courts is the one that gets the most requests, so perhaps that is the where we should start.

Thoughts?

Norbert

On Mon, Nov 4, 2013 at 4:44 PM, Brian Everett Peterson < notifications@github.com> wrote:

This is really interesting. I wish we had more people to collaborate on this, to help us get this or similar correlation between different data sets.

On Mon, Nov 4, 2013 at 10:49 AM, Forest Gregg notifications@github.comwrote:

This http://www4.cookcountysheriff.org/default1.asp?

— Reply to this email directly or view it on GitHub< https://github.com/sc3/cookcountyjail/issues/26#issuecomment-27701070> .

Brian Peterson

— Reply to this email directly or view it on GitHubhttps://github.com/sc3/cookcountyjail/issues/26#issuecomment-27730545 .

Regards

Norbert

Norbert Winklareth

— Reply to this email directly or view it on GitHub.

773.888.2718 2231 N. Monticello Ave Chicago, IL 60647

bepetersn commented 11 years ago

Hey Forest,

I just wanted to find your above explanation for reference and it took me forever to find it! So I'm going to copy it over to a more related place, at issue #209.

bepetersn commented 10 years ago

Besides this fugitive list, there's also the IDOC Prison Inmate Search, that could be scrape-able:

http://www2.illinois.gov/idoc/Offender/Pages/InmateSearch.aspx

bepetersn commented 10 years ago

There's also the Crime Incident data, if we ever want to try to correlate with that:

https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

nwinklareth commented 10 years ago

This is one that a lot of people ask for.

On Sat, Mar 1, 2014 at 9:40 PM, Brian Everett Peterson < notifications@github.com> wrote:

There's also the Crime Incident data, if we ever want to try to correlate with that:

https://data.cityofchicago.org/Public-Safety/Crimes-2001-to-present/ijzp-q8t2

Reply to this email directly or view it on GitHubhttps://github.com/sc3/cookcountyjail/issues/26#issuecomment-36445094 .

Regards

Norbert

Norbert Winklareth

bepetersn commented 10 years ago

Yeah, crime data is pretty far from being helpful, unfortunately, if someone like you or I looks at it with an eye towards incorporating it into one of our projects. It's stripped of most information that we can directly tie to any case, such that we can only look at aggregates.

One thing they give you is the block of a crime, for instance, but unfortunately our jail data is decidedly difficult to tie to locations. The best we might be able to do to tie that to our data is infer the region of a charge we see in our data from the court location the person gets sent to, and correlate based on that.

Another thing that I'm just thinking about doing as I type is figuring out stuff about our "charges" based on the "primary_type" or "IUCR" field associated with the crime data. Primary type, I believe, is a well-defined classification system that the police have for crimes. IUCR is another well-defined classification system, that says something to the seriousness of a crime. "0110" IUCR is primary type "HOMICIDE", for instance. Actually, we would have to do these kinds of conversions with our charges before we could even try to compare charges between these two datasets, because primary type is the most specific piece of data they even give about the charge with the Crime Data.

bepetersn commented 10 years ago

I tried to scope out how easy it would be to create a scraper for the Illinois Department of Corrections (IDOC) website, and their inmate locator. Seems doable. They present a single page application for their search functionality on a .aspx page (http://www2.illinois.gov/idoc/Offender/Pages/InmateSearch.aspx), which had me concerned about whether or not we could scrape it. But I looked through the network requests that the page makes, and there is flow composed of several .asp pages that taken together look identical to the .aspx page, and these can be used by a scraper. The starting page for the prison inmate locator is here: http://www.idoc.state.il.us/subsections/search/ISdefault2.asp