socrata / opendatanetwork.com

The Open Data Network
https://www.opendatanetwork.com/
Other
19 stars 12 forks source link

crime reports pipeline, take 2: all cities #313

Closed zang0 closed 8 years ago

zang0 commented 8 years ago

ticket to cover take 2 of the crime reports pipeline and summary stats. reqs:

_1 all cities _2 36 months of history _3 automated _4 monitored

zang0 commented 8 years ago

re: below - some feedback after wiring things in and poking around:

_1 looks like we lost a bunch of cities. See the attached screenshots for the dc area, before and after

screen shot 2016-04-20 at 4 25 53 pm screen shot 2016-04-20 at 4 25 35 pm

_2 seeing some weird very old anomolies. see this shot of seattle, data doesn't look semi legit till ~2010 but there's near 0 entries going way back to 1961

screen shot 2016-04-20 at 4 54 07 pm

https://odn.data.socrata.com/dataset/National-Crime-Rates-By-City-Draft-/h88a-ihpp

After some discussion with Hai and Kyle, I simplified the build process and now we've got a nationwide dataset for you to play with. The big difference is that this one links cities with data from their local agencies rather than trying to match crimes up by geo. Basically, your original suggestion. Adding in a geo filter with a fuzzy search based on the name, the matches were pretty good.

Let's see how this one looks. Give me a call or ping me on Slack if you have questions tomorrow morning.

Thanks, Chris

chrismetcalf commented 8 years ago

Hmm. I'll take a look. Now we're only using cities that have police departments in them, which could explain some of the drop-off, but there should be more in that area. It could be an ID mismatch

chrismetcalf commented 8 years ago

Alright, looking at some of the places missing in the DC area, it looks like a lot of the "cities" that are missing there don't actually operate their own police departments. Silver Spring, for example, doesn't have its own police department, and instead the Montgomery County Police Department seems to have jurisdiction over their area. This seems to be pretty common, at least in the DC area.

For the new version, my algorithm was:

In Silver Spring's case, their police department is based in Rockville, things didn't match up, so they're missing. Rockville is also 12 miles away from Silver Spring, so just using a simple geo match won't work either.

I guess it was a bad assumption to make that police departments are actually located in the cities they police. It seems to work better in the wide open spaces we have out west, which is where I was testing.

Unless we want to just aggregate data by agency jurisdiction, I'm going to have to go back to something similar to what I had before, where we aggregate on crimes occurring within the bounds of a city.

zang0 commented 8 years ago

branch w/ last data drop is iss378

zang0 commented 8 years ago

Noticed some really low numbers, so here's chicago:

https://odn.data.socrata.com/dataset/Nationwide-Crime-Stats-V3/yq6p-fzyt

search in data set on: 1600000US1714000

note: these are tiny #s of crimes. Should be well into the thousands. Now when I go to the CR.com site by clicking on the "Dig In" link, I see way more crimes. Somehow they're not making it into the analysis.

zang0 commented 8 years ago

screen shot 2016-05-03 at 10 23 29 am screen shot 2016-05-03 at 10 22 04 am

zang0 commented 8 years ago

moving to v.data.6. too much on the nationwide crime before the release goes out. I put everything on the nationwidecrime branch and reverted staging. lets get it perfected there first.

zang0 commented 8 years ago

closing since we're backing out of the crimereports approach.