Open fileunderjeff opened 8 years ago
This is a very interesting idea! I think I'm going to look into it!
I've started looking through the data, and decided to focus my initial analysis on which auto manufacturers tend to get the most tickets. I began by just looking at the raw ticket numbers, but I have also used market share to estimate the rate of ticketing.
Unfortunately, data.houstontx.gov does not have the proper links to the most recent ticketing data, so the tickets being examined here are only those given prior to 6/30/2012. I'm not sure who to reach out to in order to get the proper data.
What a cool look at the data! How interesting.
Are the datasets here incorrect? http://data.houstontx.gov/dataset/city-of-houston-parking-citations
If they are correct, we should have tickets through May 2015. I will see if we can refresh the portal with more recent ticket activity prior to the hackathon!
Also, I wonder if you control for scofflaws (i.e. people with 5+ tickets a year), how that adjusted ticket frequency might change?
Thanks Jeff! If you try to download the two datasets listed there, you'll find that they link you to the same file (the 2012 data).
Also if anyone else is interested in working on this project with me, drop a line in the comments! Unfortunately, I will not be in town for the hackathon :(
@jpoles1 I sent an inquiry to the open data people at the city to refresh this data. Hopefully we'll get that done quickly!
@jpoles1 : we'll work on getting something refreshed for you next week. I apologize for the inconvenience. We're also going to try to toss in lat and long (no promises though) in time for the Hackathon in case there is any interest in mapping.
On scofflaws, since we don't release any identifying information on offenders that may be a little difficult to do, but you could try the Entity_ID field. That said, the system isn't the smartest in the world so it isn't unusual for the a person or business to have multiple entity IDs.
@frank0051 lat/long is cool, but i think the data already has block and street (e.g. 1500 block, scott street). That might be enough for mapping purposes. Current data is more of a pressing need.
@frank0051 @jpoles1 generally, you can track scofflaws by license plate #. It looks like you guys scrub that info before releasing. Any reason why? You can publicly query citations by license plate # through T2's system.
@jpoles1 : I fixed the URL going to the wrong place for the data through 3/31/2015 (as of 5/1/2015) so you can at least get data through last year now.
@fileunderjeff : Our policy on Open Data allows for sensitive information to be removed prior to release. The Parking Management Division classifies license plates as identifiable information and was uncomfortable with releasing it. We checked with our Legal Department and they deem driver's license numbers, license plates numbers, and VINs as sensitive as well and suggested that the plate numbers shouldn't be proactively released due to Texas Government Code 522.130. That said, somehow insurance companies, law firms, traffic accident schools, and car companies trying to sell extended warranties all manage to get the information after a citation is written so I suspect it's available under the Texas Public Information Act but I don't know for sure.
The cities of Austin, Dallas and Forth Worth along with Bexar County don't release parking citation information on open data (or really any citation information). Forth Worth does release car accident info without license plates. San Antonio and El Paso doesn't seem to have an open data program from what I can tell. If you would like to explore further, please provide use feedback at https://cityforms.houstontx.gov/component/rsform/form/81-open-data-portal-ideas-and-feedback
@fileunderjeff @frank0051 Thanks so much for your help with this project!
Lat/Long data would actually be very helpful. I was considering this problem last night, and with my current knowledge of GIS analysis I think I would need that information for mapping. If coords are not provided in the dataset, I think the only other way to get them is by reverse-geocoding which would be slow (and potentially not-free given the size of this dataset).
@frank0051 Analysis of car accident data sounds like it could also yield some useful findings. Is such data available here in Houston (I could not find it with a quick search of the data portal)?
Thanks @frank0051. Here is what I wrote:
"I would like to know more about parking scofflaw data, however, the citation data set on the open data portal is incomplete and inconsistent. The best solution would be to release license plate information for each citation. If that is not possible, I would suggest getting a citation count by license plate in T2 and adding it to your citation report."
I am very familiar with the parking system in question, and the report addition should be very simple and very helpful.
In addition, I just glanced at it, but this parking data might also be interesting to examine for this project.
@frank0051 thank you for the explanation. I really appreciate it. For posterity, here are a few interesting links on the subject and why I believe the City and State positions are wrong:
http://jalopnik.com/why-do-we-always-blur-license-plates-on-the-internet-1691298199
http://motherboard.vice.com/blog/why-wont-cops-share-the-license-plate-data-they-collect
and so on.
@frank0051 Thanks for getting the updated dataset available so quickly, I'm going to take a look now!
Here's a look at the same analysis I performed earlier on the new data (post-2012):
I've put my code in this repo if anyone would like to take a look. In addition, if you're interested in working with me on this project, drop a line below, and we'll find a way to get you the SQLite database I have been using (it's too large to upload to github).
@jpoles1, @fileunderjeff updated the data and broke into smaller files. Added a lat and long where we had it in the system. Pretty sparse at this point. http://data.ohouston.org/dataset/city-of-houston-parking-citations
@frank0051 Why is it that only certain records have lat/long data associated with them?
It's in the metadata. The handhelds have only been transmitting since mid 2014 and with some frequency the handhelds fail to record the coordinates. Feel free to use that feedback form mentioned earlier to suggest we geocode all entries for more reliable results.
Sent from Yahoo Mail on Android
On Fri, May 13, 2016 at 6:04 PM, Jordan Polesnotifications@github.com wrote:
@frank0051 Why is it that only certain records have lat/long data associated with them?
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub
I've managed to incorporate the lat/long data into the report with some maps! They're not perfect, but they add some interesting insights to the report.
This project was also picked up by a Houston real-estate blog!
@jpoles1 , @fileunderjeff : we're a go for adding a column into the dataset that tells you have many citations have been issued for the plate number associated with the ticket. We cannot share out the plate number through in our current state.
Do you have suggestions on how you would like this count column to work? Would you like it to be a count of:
@frank0051 awesome news! All three would be amazing, and we could derive useful insights from each one. But the first two would be the most interesting to me, with "all citations issued to that plate" being the top priority.
@frank0051 is it possible to request citation report data as a separate dataset? to be able to reconcile citations issued with citations paid and confirm/seek to improve the city's collection rate?
@fileunderjeff : not sure I'm following the logic re: citation report data as a separate data set? Our current citations dataset includes all of the citations and the status as to whether they're paid. We don't include the a last payment date as we don't extract all of the financial history into our reporting environment as it would be too large compared to the time it would take and the value it would add.
@frank0051 If we cannot get license plate data, is there a possibility that we can instead get a non-personally-identifying, unique ID # (based on the plate) for each row?
I agree with @fileunderjeff that "all citations issued to that plate" would be the most useful piece of information for identifying the "scofflaws"
@jpoles1 : http://performance.houstontx.gov/opendata/parking-scofflaw-data
That's where things stand on providing anything that would allow you to tie back to a plate per what we provided back to Jeff. I would be happy to assist you through the TPIA process to see if Legal is willing to offer additional clarification or reconsider. Unfortunately, as the City does not have an Enterprise Data Officer currently appointed, we currently do not have a policy-based conduit under which to facilitate a formal review on the validity of the security concern given existing resources without a formalized request. Given this, to better ensure your request is addressed, I need to refer you to the TPIA process.
@jpoles1 , @fileunderjeff : I designed the logic to count a) the outstanding tickets at the point the extract is run and b) all citations issued in our system to that license plate at the time the extract is run. When I started thinking through how it could be used I got rather frustrated at its limited potential for analysis purposes. Even though we do not have an Enterprise Data Officer at this time, I went ahead and brought the issue back to Legal for a possible work-around. As such; we will be releasing the data with an anonymized plate number. We will try to get it out there this afternoon. We will also be making a minor change as to how Officer is shared.
@frank0051 that is amazing. thank you!
@jpoles1 , @fileunderjeff : The parking citation dataset has been updated to include an anonymized plate number and citation dollar amounts. At an undetermined point in the future, the City will work to automate the release of this dataset to a monthly release schedule. Please consult the metadata file regularly as the way the anonymized license plate functions may change.
@jpoles1 any interest in refreshing your study with current data?
Sure, I could get behind that. Would also be great to come up with a list of any possible concrete goals or applications for this data/project.
@jpoles1 I have emailed some folks inside the city who might have some good ideas where we can go with this. I've invited them to chime in on this thread, so hopefully we'll get a reboot going soon!
Excellent! Will be interested to hear some perspectives on what direction we should take this analysis.
Looking at parking citation data, create a map of where tickets occur. You could use this map to optimize the City of Houston Parking Enforcement operation, or simply just to avoid certain areas at specific times.