sc3 / cookcountyjail

A Django app that tracks the population of Cook County Jail over time and summarizes trends.
http://cookcountyjail.recoveredfactory.net/api/1.0/?format=json
Other
31 stars 23 forks source link

Database and process tweaks #387

Closed wilbertom closed 10 years ago

wilbertom commented 10 years ago

A lot of work on the database and started to play with the process route. Was able to import 52078 records from version 1 with the following script:

INMATES = "http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate/"

def get_json(url, limit=100, offset=0):

    r = requests.get(url, params={'limit': limit, 'format': 'json', 'offset': offset})
    print(r, r.url)

    return r.json()

if __name__ == '__main__':

    data = get_json(INMATES, 0)
    data = data['objects']

    for row in data:

        phash = row['person_id']

        if phash is not None:

            gender = row['gender']
            race = row['race']

            r = requests.post('http://localhost:5000/process', data={'data': json.dumps({'hash': phash, 'gender': gender, 'race': race}) })

            print(r)

API preview:

{
  "num_results": 52078, 
  "objects": [
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "7a6df4ff717b35baaea6cfbf06182017a570ee56685d9263fc86d576f604b765", 
      "id": 1, 
      "race": "LW"
    }, 
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "913d20b5b377fe35ad0364bbf684afac21cd01e0bc224e5ae1a57578090db048", 
      "id": 2, 
      "race": "BK"
    }, 
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "94ff70cdc84742a5f8b2273d1715b81c1389fa35e84ab4f71ea24996b2de9d44", 
      "id": 3, 
      "race": "BK"
    }, 
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "448634cd5a8c962b9e1a1d530d99e642aa74b62b2799e434bd1928e9480676cd", 
      "id": 4, 
      "race": "BK"
    }, 
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "7a139f553df7ced81aeb25f6871591c61256524cd5cd81aae11feeaec69f4bdb", 
      "id": 5, 
      "race": "BK"
    }, 
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "64c4beeb1e050a4fba643d1ed11d41e3bc9581e7b7048756f9cec8cb8223e5d1", 
      "id": 6, 
      "race": "LW"
    }, 
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "30ca3ab4b3c04c027bfee0949b4c90ca041728d69ea658e53b42ba2c866870e7", 
      "id": 7, 
      "race": "BK"
    }, 
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "bba91e20d637696e18e405d1a94e0404215de2ed421e22042987f29adc265a48", 
      "id": 8, 
      "race": "BK"
    }, 
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "ebe7dd954718d44cc5f611fe6860e88a6a52cc5acc010044b98963d3cec68887", 
      "id": 9, 
      "race": "BK"
    }, 
    {
      "date_created": "2014-05-13", 
      "gender": "M", 
      "hash": "98692e9f4f702fbd0042da849a9fcfaafdc7e17e5770586941ee243277c3de34", 
      "id": 10, 
      "race": "BK"
    }
  ], 
  "page": 1, 
  "total_pages": 5208
}
bepetersn commented 10 years ago

What is the process route? I noticed it while browsing your code earlier. On May 13, 2014 11:07 PM, "Wilberto Morales" notifications@github.com wrote:

A lot of work on the database and started to play with the process route. Was able to import 52078 records from version 1 with the following script:

INMATES = "http://cookcountyjail.recoveredfactory.net/api/1.0/countyinmate/"

def get_json(url, limit=100, offset=0):

r = requests.get(url, params={'limit': limit, 'format': 'json', 'offset': offset})
print(r, r.url)

return r.json()

if name == 'main':

data = get_json(INMATES, 0)
data = data['objects']

for row in data:

    phash = row['person_id']

    if phash is not None:

        gender = row['gender']
        race = row['race']

        r = requests.post('http://localhost:5000/process', data={'data': json.dumps({'hash': phash, 'gender': gender, 'race': race}) })

        print(r)

API preview:

{ "num_results": 52078, "objects": [ { "date_created": "2014-05-13", "gender": "M", "hash": "7a6df4ff717b35baaea6cfbf06182017a570ee56685d9263fc86d576f604b765", "id": 1, "race": "LW" }, { "date_created": "2014-05-13", "gender": "M", "hash": "913d20b5b377fe35ad0364bbf684afac21cd01e0bc224e5ae1a57578090db048", "id": 2, "race": "BK" }, { "date_created": "2014-05-13", "gender": "M", "hash": "94ff70cdc84742a5f8b2273d1715b81c1389fa35e84ab4f71ea24996b2de9d44", "id": 3, "race": "BK" }, { "date_created": "2014-05-13", "gender": "M", "hash": "448634cd5a8c962b9e1a1d530d99e642aa74b62b2799e434bd1928e9480676cd", "id": 4, "race": "BK" }, { "date_created": "2014-05-13", "gender": "M", "hash": "7a139f553df7ced81aeb25f6871591c61256524cd5cd81aae11feeaec69f4bdb", "id": 5, "race": "BK" }, { "date_created": "2014-05-13", "gender": "M", "hash": "64c4beeb1e050a4fba643d1ed11d41e3bc9581e7b7048756f9cec8cb8223e5d1", "id": 6, "race": "LW" }, { "date_created": "2014-05-13", "gender": "M", "hash": "30ca3ab4b3c04c027bfee0949b4c90ca041728d69ea658e53b42ba2c866870e7", "id": 7, "race": "BK" }, { "date_created": "2014-05-13", "gender": "M", "hash": "bba91e20d637696e18e405d1a94e0404215de2ed421e22042987f29adc265a48", "id": 8, "race": "BK" }, { "date_created": "2014-05-13", "gender": "M", "hash": "ebe7dd954718d44cc5f611fe6860e88a6a52cc5acc010044b98963d3cec68887", "id": 9, "race": "BK" }, { "date_created": "2014-05-13", "gender": "M", "hash": "98692e9f4f702fbd0042da849a9fcfaafdc7e17e5770586941ee243277c3de34", "id": 10, "race": "BK" } ], "page": 1, "total_pages": 5208 }


You can merge this Pull Request by running

git pull https://github.com/wilbertom/cookcountyjail v2.0-dev

Or view, comment on, or merge it at:

https://github.com/sc3/cookcountyjail/pull/387 Commit Summary

  • Default to postgres closes #365
  • Add name to enum, postgres compatible
  • Tweaks for the new information model
  • Add the model changes to the api
  • Fix migration for postgres, #381
  • Ok really Add name to enum, postgres compatiblei, closes #381
  • update docs
  • Some tweaks to the models which close #348
  • Play with the Process route

File Changes

  • M README.mdhttps://github.com/sc3/cookcountyjail/pull/387/files#diff-0(28)
  • M ccj/app.pyhttps://github.com/sc3/cookcountyjail/pull/387/files#diff-1(57)
  • M ccj/config.pyhttps://github.com/sc3/cookcountyjail/pull/387/files#diff-2(24)
  • M ccj/models/migrations/versions/117bd134663c_.pyhttps://github.com/sc3/cookcountyjail/pull/387/files#diff-3(2)
  • A ccj/models/migrations/versions/45e4c0464b06_.pyhttps://github.com/sc3/cookcountyjail/pull/387/files#diff-4(48)
  • A ccj/models/migrations/versions/c52485120f6_.pyhttps://github.com/sc3/cookcountyjail/pull/387/files#diff-5(47)
  • M ccj/models/models.pyhttps://github.com/sc3/cookcountyjail/pull/387/files#diff-6(29)
  • M ccj/rest_api.pyhttps://github.com/sc3/cookcountyjail/pull/387/files#diff-7(15)

Patch Links:

— Reply to this email directly or view it on GitHubhttps://github.com/sc3/cookcountyjail/pull/387 .

wilbertom commented 10 years ago

The process route is the one responsible for creating and updating the database data pushed by the scraper.

bepetersn commented 10 years ago

Hey Wil, this looks great. One thing that can be corrected at your leisure is the name of one of the models. You called it "statue", when it's supposed to be "statute".

wilbertom commented 10 years ago

Lol sorry.