lahoffm / aclu-bail-reform

Webscraping, ETL and visualization of Georgia county jail statistics for ACLU bail reform project
MIT License
8 stars 11 forks source link

Dekalb county webscraper #6

Closed lahoffm closed 6 years ago

lahoffm commented 7 years ago

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

https://ody.dekalbcountyga.gov/app/JailSearch/#/search Inmate name search only.

rimjieun commented 7 years ago

Will try this one.

lahoffm commented 7 years ago

Please also add a README.md explaining how to install/run it and basic output to expect.

lahoffm commented 7 years ago

CSV output format? https://github.com/lahoffm/aclu-bail-reform/blob/master/CONTRIBUTING.md

lahoffm commented 7 years ago

I got this email: "In response to your email about a jail roster, the DeKalb County Inmate look up on our web page is the only public resource that is available ." Spears, Stacey SSpears@dekalbcountyga.gov

tonyfast commented 7 years ago

Use the query a b c d e f g h i j k l m n o p q r s t u v w x y z to get everything. Then construct the post request.

rimjieun commented 7 years ago

Unfortunately their search engine doesn't allow one-letter queries...so for now I'm just querying 'dkso' to reproduce the post request. I'm not sure if that's getting all the data but I'll play around with the query string some more after I complete formatting some data.

rimjieun commented 7 years ago

I am having trouble scraping the Jail View page using Beautiful Soup. For example, when I try to scrape https://ody.dekalbcountyga.gov/app/ViewJailing/#/jailing/1105374, it gives me a different preview page from what I see in the browser (I believe they're using Angular, dunno if that affects anything). I was thinking about giving PyQt4 a shot but haven't gotten to it yet. If you know anything about this, please let me know!

lahoffm commented 7 years ago

Don't know about this, we can talk about it tonight if you're coming.

rimjieun commented 7 years ago

Dekalb is ready for review. Pull request submitted.