lahoffm / aclu-bail-reform

Webscraping, ETL and visualization of Georgia county jail statistics for ACLU bail reform project
MIT License
8 stars 11 forks source link

Muscogee county webscraper #14

Closed lahoffm closed 6 years ago

lahoffm commented 6 years ago

Make webscraper that spits out a CSV.

Please make a subfolder under src/webscraper that is the county name and contains all your code (Python 3.6) so no merge conflicts.

https://www.columbusga.org/sheriff/InmateSearch.htm

They have a 14 day intake docket and 14 day release docket. For current roster they just have an inmate name search.

lahoffm commented 6 years ago

Please also add a README.md explaining how to install/run it and basic output to expect.

lahoffm commented 6 years ago

I got this email from Dena Bearden DBearden@columbusga.org: "Our docket shows the day of arrest. If you needed more info you would need to do an open request."

jttew commented 6 years ago

I'll work on this one

jttew commented 6 years ago

If you click "open in new tab" on the next page button you can go to the individual pages of the docket as a URL without having to simulate clicks. https://ccgapps1.columbusga.org/appl/MCSOJailInmateInformation.nsf/Web14DayIntake?OpenView&Start=1&Count=10 https://ccgapps1.columbusga.org/appl/MCSOJailInmateInformation.nsf/Web14DayRelease?OpenView&Start=1&Count=10

jttew commented 6 years ago

The Muscogee County scraper is going pretty well. I should have a functional version in the next few days when I have some free time.

jttew commented 6 years ago

I just finished creating a functional version. It might need a few tweaks.

lahoffm commented 6 years ago

What kind of tweaks? What stuff is not yet in CONTRIBUTING.md format? It would help if you gave us a list of things that remain to be done.

I or @rimjieun will review your code in detail at some point.

jttew commented 6 years ago

I tweaked it a little bit after submitting that comment and fixed a lot of what I was thinking of that needed to be tweaked. There are still a couple things I think might be issues in the future. 1.) The page it is scraping from changed url on me before so it probably needs to navigate from the menu page instead of using the current hardcoded urls . 2.) The notes field is not implemented yet. My functions typically return empty strings if they encounter something unexpected. Whenever that happens it could put a message in the notes field.

rimjieun commented 6 years ago

Finally got to reviewing Muscogee. Everything looks good for the most part. Just a few minor things I noticed:

Also, @jttew can you provide an example in the README.md for writing the chromedriver path? I had a bit of trouble running the scraper in the beginning because I tried using the path, although I ended up not using any path (maybe because I already had it setup in my environment variables).

lahoffm commented 6 years ago

If no unique URL I just put the same URL for everything, the main county URL. If severity is never provided at all for any charges,, OK to leave blank.

jttew commented 6 years ago

I changed the CSV output to the data folder, fixed the url typo, and decided to add the chromedriver to the project files in the Muscogee folder.

rimjieun commented 6 years ago

Accidentally used my other account to comment earlier...

Submitted PR for current time stamp fix.

And everything looks good!

lahoffm commented 6 years ago

Starting data collection, multiple times per day.

For the URL changing, I’ll monitor output daily & change URL myself in the code. If it changes frequently enough to be annoying, I might ask if you can add the extra navigation code.

Not closing issue till it passes #18 but I consider it done barring any bugs.

Thank you for your help @jttew!