Closed safeandfree closed 5 years ago
Is the filename always "jims1058.txt"? If so, you could use python's urllib.urlretrieve or wget running as a cron job.
If the file name changes, what is the pattern? I presume the 1058 in "jims1058.txt" changes depending on the date? Once the pattern is known, that can be used to change what url the download script points to.
The filename is always jims1058.txt. I will look at these ideas. I’m not a programmer but I’m not afraid to muck around a little. ☺
OK I officially don’t understand. ☺ I have looked at both options, and in my complete ignorance I like the urllib.urlretrieve option…but now what?
FYI, I noticed this htm version that is easier to read with page breaks. http://www.jims.hctx.net/jimshome/jimsreports/jims1058.htm
But, I understand you just want the raw data to merge into your database.
Bookings and Releases within last 24 hours, This one does not have a jims503.txt http://www.jims.hctx.net/jimshome/jimsreports/jims503.htm
These reports do not have context. I also notice that the arrest dates and booking dates may be off by a few days. I think that the booking date is what these reports are based on. I also noticed that some people have different arrest dates for the same person. So is it possible to get lost in the system if you get arrested but not booked?
-John
On Tue, Jul 19, 2016 at 10:21 AM, safeandfree notifications@github.com wrote:
OK I officially don’t understand. ☺ I have looked at both options, and in my complete ignorance I like the urllib.urlretrieve option…but now what?
Kathy Mitchell Texas Criminal Justice Coalition | Grassroots Sentencing Campaign Coordinator 1714 Fortview Road, Suite 104 Austin, Texas 78704 Office: (512) 441-8123 Ext. 116 Fax: (512) 441-4884 www.TexasCJC.orghttp://www.texascjc.org/ | www.facebook.com/TexasCJC x-msg://4/www.facebook.com/TexasCJC | www.twitter.com/TexasCJC x-msg://4/www.twitter.com/TexasCJC
TCJC works with peers, policy-makers, practitioners, and community members to identify and promote smart justice policies that safely reduce Texas’ costly over-reliance on incarceration – creating stronger families, less taxpayer waste, and safer communities. DONATE TODAY!< https://co.clickandpledge.com/sp/d1/default.aspx?wid=45713>
From: Andrew Nelson [mailto:notifications@github.com] Sent: Monday, July 18, 2016 8:45 PM To: open-austin/project-ideas Cc: Kathy Mitchell; Author Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
Is the filename always "jims1058.txt"? If so, you could use python's urllib.urlretrievehttp://stackoverflow.com/a/22776 or wget running as a cron job< https://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/>.
If the file name changes, what is the pattern? I presume the 1058 in "jims1058.txt" changes depending on the date? Once the pattern is known, that can be used to change what url the download script points to.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/open-austin/project-ideas/issues/73#issuecomment-233508430>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ARrnX4-w2KHQSeg0hGUSdX1gW-gK3HBZks5qXCwogaJpZM4JPQ9o>.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/open-austin/project-ideas/issues/73#issuecomment-233667578, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7i2FJqH4Ab6BJN3AVxylStPUBU7Z2Jks5qXOuCgaJpZM4JPQ9o .
@werdnanoslen The URL is the same. I was looking at it last night and again this morning. Take a look at the http://www.jims.hctx.net/jimshome/jimsreports/ directory. It seems that jims1058.txt
gets updates everyday around 2:30am. I already have a ruby class that pulls jims1058.txt
, parses it (using the native CSV lib if you're wondering), and holds the information. That part isn't too difficult. The next step would be to dump it into a DB of some sort.
EDIT: @safeandfree, you might also want to look at http://www.jims.hctx.net/jimshome/jimsreports/. As @Woodley is pointing out, there might be some other data you might find interesting. Although, based on the timestamps, I think jims1058.txt
is the only one that is getting updated daily.
In addition to putting the info in a DB, which will allow you to search and query against the info, you probably want to save the actual file somewhere for reference later (S3 maybe?). That just allows you to rebuild the DB from scratch if you want/need.
Also, just for the sake of having it recorded since it took a while for me to make the connection last night, JIMS stands for Justice Information and Management System. (It's not a guy named Jim 😉 ). I assumed the 1058
part in the file name is reference to something. A law? A form? From the technical aspect, that doesn't change anything I guess. Just context.
Fascinating. I didn’t notice the gap between arrest and booking. Yes, people have gotten “lost” although it is supposedly rare. Unless the police officer is just driving around with the arrestee in his car for days, the person is probably stuck at some point in the booking process. These kinds of questions are among the many, many things this data will start to allow us to investigate.
Kathy Mitchell
@ Coby, Speaking of JIMs, that is making me hungry for Jims Food.
@safeandfree I think by law they can hold you for 48 hours before charging or releasing you.
On Tue, Jul 19, 2016 at 10:49 AM, safeandfree notifications@github.com wrote:
Fascinating. I didn’t notice the gap between arrest and booking. Yes, people have gotten “lost” although it is supposedly rare. Unless the police officer is just driving around with the arrestee in his car for days, the person is probably stuck at some point in the booking process. These kinds of questions are among the many, many things this data will start to allow us to investigate.
Kathy Mitchell Texas Criminal Justice Coalition | Grassroots Sentencing Campaign Coordinator 1714 Fortview Road, Suite 104 Austin, Texas 78704 Office: (512) 441-8123 Ext. 116 Fax: (512) 441-4884 www.TexasCJC.orghttp://www.texascjc.org/ | www.facebook.com/TexasCJC x-msg://4/www.facebook.com/TexasCJC | www.twitter.com/TexasCJC x-msg://4/www.twitter.com/TexasCJC
TCJC works with peers, policy-makers, practitioners, and community members to identify and promote smart justice policies that safely reduce Texas’ costly over-reliance on incarceration – creating stronger families, less taxpayer waste, and safer communities. DONATE TODAY!< https://co.clickandpledge.com/sp/d1/default.aspx?wid=45713>
From: Woodley [mailto:notifications@github.com] Sent: Tuesday, July 19, 2016 10:28 AM To: open-austin/project-ideas Cc: Kathy Mitchell; Author Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
FYI, I noticed this htm version that is easier to read with page breaks. http://www.jims.hctx.net/jimshome/jimsreports/jims1058.htm
But, I understand you just want the raw data to merge into your database.
Bookings and Releases within last 24 hours, This one does not have a jims503.txt http://www.jims.hctx.net/jimshome/jimsreports/jims503.htm
These reports do not have context. I also notice that the arrest dates and booking dates may be off by a few days. I think that the booking date is what these reports are based on. I also noticed that some people have different arrest dates for the same person. So is it possible to get lost in the system if you get arrested but not booked?
-John
On Tue, Jul 19, 2016 at 10:21 AM, safeandfree <notifications@github.com mailto:notifications@github.com> wrote:
OK I officially don’t understand. ☺ I have looked at both options, and in my complete ignorance I like the urllib.urlretrieve option…but now what?
Kathy Mitchell Texas Criminal Justice Coalition | Grassroots Sentencing Campaign Coordinator 1714 Fortview Road, Suite 104 Austin, Texas 78704 Office: (512) 441-8123 Ext. 116 Fax: (512) 441-4884 www.TexasCJC.orghttp://www.texascjc.org/<http://www.TexasCJC.org %3chttp:/www.texascjc.org/> | www.facebook.com/TexasCJC< http://www.facebook.com/TexasCJC> x-msg://4/www.facebook.com/TexasCJC | www.twitter.com/TexasCJC< http://www.twitter.com/TexasCJC> x-msg://4/www.twitter.com/TexasCJC
TCJC works with peers, policy-makers, practitioners, and community members to identify and promote smart justice policies that safely reduce Texas’ costly over-reliance on incarceration – creating stronger families, less taxpayer waste, and safer communities. DONATE TODAY!< https://co.clickandpledge.com/sp/d1/default.aspx?wid=45713>
From: Andrew Nelson [mailto:notifications@github.com] Sent: Monday, July 18, 2016 8:45 PM To: open-austin/project-ideas Cc: Kathy Mitchell; Author Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
Is the filename always "jims1058.txt"? If so, you could use python's urllib.urlretrievehttp://stackoverflow.com/a/22776 or wget running as a cron job<
https://www.mattcutts.com/blog/how-to-fetch-a-url-with-curl-or-wget-silently/>.
If the file name changes, what is the pattern? I presume the 1058 in "jims1058.txt" changes depending on the date? Once the pattern is known, that can be used to change what url the download script points to.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub<
https://github.com/open-austin/project-ideas/issues/73#issuecomment-233508430>,
or mute the thread<
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub < https://github.com/open-austin/project-ideas/issues/73#issuecomment-233667578>,
or mute the thread < https://github.com/notifications/unsubscribe-auth/AQ7i2FJqH4Ab6BJN3AVxylStPUBU7Z2Jks5qXOuCgaJpZM4JPQ9o>
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub< https://github.com/open-austin/project-ideas/issues/73#issuecomment-233669841>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ARrnX9PQkqHHbj_cUv0bQ49uK-tsbHyUks5qXO0bgaJpZM4JPQ9o>.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-austin/project-ideas/issues/73#issuecomment-233676646, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7i2IS4hAUzapTT6GqO0Ouo67_uFEDuks5qXPH1gaJpZM4JPQ9o .
@colbywhite I think @safeandfree said they can load the files as-is into Access, since the it seems to be a database format delimited by semicolons and tabs.
@safeandfree have you used Ruby before? What is the OS on the computer that you'd like the files to be downloaded to?
Ah, I see. Didn't realize that was in an Access format. Sweet. That makes it even easier then. I'm not familiar with Access, but I assume it has a remote import feature? A cron job that just shoves the file into Access should suffice? Doesn't matter which language then. Whichever has the better library for importing into Access I guess.
Access does have a remote import feature. I tried to use it to go directly to that page and got all sorts of errors. ☺
I hope you are getting these replies. I should go into git…
Yes, I’m at a small nonprofit with the basic Microsoft tools on a Windows 7 operating system. I can work on something that is hosted remotely too.
@safeandfree, if you're able to get a hosted Access instance, then that would definitely make this even simpler. Load up the file, shoot it into your Access instance. I, unfortunately, have no experience with hosted Access instances. So I wouldn't know where to look. Somebody else have input on that?
Assuming you get that instance, would you want some kind of website on top of it in order to query it? Or do you plan on just querying the Access DB directly to get what you need out of it?
So if it's important for the data to be accessible from the desktop MS Access application on Kathy's machine, we might be able to run a Microsoft Azure SQL database and configure her Access app to pull data from the Azure cloud.
Linking Access Applications to SQL Server - Azure SQL DB Office Support: Link to SQL Server data
We have credits, hard to decipher how much but I think like $130 worth of credits. Their basic plan is $5 a month for 2GB, then $15 for 250GB, so we would have credits to get us through 1-2 years depending on the size of these files and could probably ask for more from contacts at Microsoft that know Code for America. I was expecting to find an Access in the Cloud type of service but that is probably part of Office 365, not Azure.
This is what the Azure SQL web UI looks like
Alternatively, we could run a Microsoft SQL Server on AWS where we don't have any known credits caps. Microsoft SQL Server on Amazon RDS
Looping in @luqmaan & @gusIreland who manage our hosting resources.
This sounds awesome!
After Azure credits run out, I will see what I can do about providing access to bluemix, which I will likely be working on at ibm.
yeah, good point. An Azure SQL database should be simple to migrate to whatever hosting platform necessary. (If it isn't simple we shouldn't use it.)
I don't think this project needs a database.
A database is a lot of work to setup, is not open, and is not easy to access.
A simpler and more open solution is to do what we did with the construction-permits project:
The data is searchable: https://github.com/open-austin/construction-permits/search?utf8=%E2%9C%93&q=7east
The data is browsable: https://github.com/open-austin/construction-permits/tree/master/data
Because the data is in CSV format:
I totally want this data to be someplace where people can get it for lots of purposes. A primary purpose for me is to be able to surface patterns and ID people who are the victim of patterns of over policing. Yesterday, I took several days of the data, merged it into a single data file, and then ran cross tabs in order to ID people who were arrested on a single class C misdemeanor charge so I could write them a letter asking for more information about what happened and why they were actually jailed on an offense that does not have jail as an available punishment (Class C misdemeanors are "fine-only" offenses). A search function for everyone in the file with a class C misdemeanor would pull up a huge number of people who were also charged with other offenses in the same arrest. This file has a separate row for every charge, so the same person might be in there eight times if they were arrested on eight charges. Does that information change the analysis?
From: Luqmaan Dawoodjee notifications@github.com Sent: Wednesday, July 20, 2016 9:44:31 AM To: open-austin/project-ideas Cc: Kathy Mitchell; Mention Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
I don't think this project needs a database.
A database is a lot of work to setup, is not open, and not easy to access.
A simpler and more option solution is to do what we did with the construction-permitshttps://github.com/open-austin/construction-permits project:
The data is searchable: https://github.com/open-austin/construction-permits/search?utf8=%E2%9C%93&q=7east
The data is browsable: https://github.com/open-austin/construction-permits/tree/master/data
Because the data is in CSV format:
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-austin/project-ideas/issues/73#issuecomment-233971154, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARrnXz_zKYFk36Sz2qJDgg1LbTNVLDwHks5qXjRPgaJpZM4JPQ9o.
I know this data is already published online, but how do we feel about dumping a huge list of names of people who have been booked along with their birthdays into a publicly viewable Github repo as CSVs. Is there a concern for their privacy? This could be one argument for why we would want to store it on a DB and only give ppl access if they request it.
If we're ok with storing this data publicly, then it seems like writing data through the Github API is a good first step. And from there maybe we will discover functionality that requires a real SQL db.
Not only is this data public already, but there are some unsavory web companies that actually post it already. I'm not sure we're making people's privacy any worse by putting this on git for now. In the long run, this is likely to be a big issue at the leg this session because, yes, its a problem from a privacy standpoint.
From: Mateo Clarke notifications@github.com Sent: Wednesday, July 20, 2016 10:23:23 AM To: open-austin/project-ideas Cc: Kathy Mitchell; Mention Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
I know this data is already published online, but how do we feel about dumping a huge list of names of people who have been booked along with their birthdays into a publicly viewable Github repo as CSVs. Is there a concern for their privacy? This could be one argument for why we would want to store it on a DB and only give ppl access if they request it.
If we're ok with storing this data publicly, then it seems like writing data through the Github API is a good first step. And from there maybe we will discover functionality that requires a real SQL db.
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-austin/project-ideas/issues/73#issuecomment-233983808, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARrnX0HkPZXa9zYcSPkSr8kZtJdjzjnQks5qXj1rgaJpZM4JPQ9o.
@luqmaan, I agree if we can get away with not setting up a database, we should. But do you think that kind of search functionality fulfills the type of searching @safeandfree is looking for? For instance, using your construction permit example, I can search for how many Sign Permit
s have been issued (5,866), but I can't figure out how many Sign Permit
s were issued for Spicewood Springs using the github search alone. I also can't figure out how many Sign Permit
s have been issued since 2000 using the github search alone. I can't compare the amount of Sign Permit
s in 2000 to 1999.
To figure those out, I would have to download all the csvs and load them into my own database. So my question goes to @safeandfree, is that a valid solution? Could we set something up to start downloading the jims1058.txt
files and storing them here? And from there you can load them into whichever personal DB you decide is necessary. In many ways, that leaves you in a similar position you are in now, except now you would have some historical data, as opposed to just one day's worth. You also wouldn't have to update you DB everyday. Since the jims1058.txt
file is being stored here everyday, you can feel sure that you're not missing a day and just update your personal DB whenever you decide you want new data. (Maybe we can include a script or two to make that easier.)
You know, as I type that out, that solution is growing on me. It does put some of the burden on the person looking to make in-depth conclusions based on the data - i.e. @safeandfree.
As for the privacy question, I think you guys would be better equipped to answer that than me. But I would point out that, using on that construction permits repo, I was able to surmise that two people named Larry Butler and Carol Ann Sayle remodeled a home on Lyons road in 1980. So I think you guys have already staked out a position somewhere on the privacy spectrum. This would just seem to follow that position.
Hey all, yes to just getting a script to start pulling the data every day without me having to remember. That would be AWESOME.
But also, is there a way to have that script add the new file every day to one spreadsheet instead of making separate files? Ideally I would like to have a year of data slowly accumulate. When its a daily file, it gets a bit rough to manually merge them all together.
And finally, yes, if we can just get it into a big file that can be downloaded as .csv I can upload it to a personal database on my desktop. Because I do need to do complex things. Today for example I needed to know how many people who were arrested for evading arrest had no other related charge. So they were evading arrest for what exactly? I also want to be able to eventually map where people live. Are people who are arrested for evading arrest (with no other charge) disproportionately from certain neighborhoods?
This is very rich data and is going to reveal a great deal about the front end of policing that no one actually knows now. Or, well, some people know it very well, but not lawmakers or city council members.
From: Colby M. White notifications@github.com Sent: Thursday, July 21, 2016 6:30 PM To: open-austin/project-ideas Cc: Kathy Mitchell; Mention Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
@luqmaanhttps://github.com/luqmaan, I agree if we can get away with not setting up a database, we should. But do you think that kind of search functionality fulfills the type of searching @safeandfreehttps://github.com/safeandfree is looking for? For instance, using your construction permit example, I can search for how many Sign Permits have been issued (5,866https://github.com/open-austin/construction-permits/search?utf8=%E2%9C%93&q=Sign+Permit&type=Code), but I can't figure out how many Sign Permits were issued for Spicewood Springshttps://github.com/open-austin/construction-permits/search?utf8=%E2%9C%93&q=Sign+Permit+Spicewood+Springs&type=Code using the github search alone. I also can't figure out how many Sign Permits have been issued since 2000 using the github search alone. I can't compare the amount of Sign Permits in 2000 to 1999.
To figure those out, I would have to download all the csvs and load them into my own database. So my question goes to @safeandfreehttps://github.com/safeandfree, is that a valid solution? Could we set something up to start downloading the jims1058.txt files and storing them here? And from there you can load them into whichever personal DB you decide is necessary. In many ways, that leaves you in a similar position you are in now, except now you would have some historical data, as opposed to just one. You also wouldn't have to update you DB everyday. Since the jims1058.txt file is being stored here everyday, you can feel sure that you're not missing a day and just update your personal DB whenever you decide you want new data. (Maybe we can include a script or two to make that easier.)
You know, as I type that out, that solution is growing on me. It does put some of the burden on the person looking to make in-depth conclusions based on the data - i.e. @safeandfreehttps://github.com/safeandfree.
As for the privacy question, I think you guys would be better equipped to answer that than me. But I would point out that, using on that construction permits repo, I was able to surmise that two people named Larry Butler and Carol Ann Sayle remodeled a home on Lyons road in 1980https://github.com/open-austin/construction-permits/blob/master/data/1980/1980-01-02.csv. So I think you guys have already staked out a position somewhere on the privacy spectrum. This would just seem to follow that position.
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-austin/project-ideas/issues/73#issuecomment-234414750, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARrnX7OZrpBxwD4M1f1lAUo5zzjBxaMWks5qYAESgaJpZM4JPQ9o.
I am not an attorney and nothing in this can be construed as legal advice. Having said that...
If you are concerned about privacy you should contact a lawyer. You could also contact the Attorney General's Office to see what they thin about privacy. The following are publicly available resources;
GOVERNMENT CODE TITLE 5. OPEN GOVERNMENT; ETHICS SUBTITLE A. OPEN GOVERNMENT CHAPTER 552. PUBLIC INFORMATION SUBCHAPTER A. GENERAL PROVISIONS http://www.statutes.legis.state.tx.us/Docs/GV/htm/GV.552.htm
Texas Attorney General - Public Information Act Handbook https://www.texasattorneygeneral.gov/files/og/publicinfo_hb.pdf
http://www.open-public-records.com/texas_public_records.htm
On Thu, Jul 21, 2016 at 8:14 PM, safeandfree notifications@github.com wrote:
Hey all, yes to just getting a script to start pulling the data every day without me having to remember. That would be AWESOME.
But also, is there a way to have that script add the new file every day to one spreadsheet instead of making separate files? Ideally I would like to have a year of data slowly accumulate. When its a daily file, it gets a bit rough to manually merge them all together.
And finally, yes, if we can just get it into a big file that can be downloaded as .csv I can upload it to a personal database on my desktop. Because I do need to do complex things. Today for example I needed to know how many people who were arrested for evading arrest had no other related charge. So they were evading arrest for what exactly? I also want to be able to eventually map where people live. Are people who are arrested for evading arrest (with no other charge) disproportionately from certain neighborhoods?
This is very rich data and is going to reveal a great deal about the front end of policing that no one actually knows now. Or, well, some people know it very well, but not lawmakers or city council members.
From: Colby M. White notifications@github.com Sent: Thursday, July 21, 2016 6:30 PM To: open-austin/project-ideas Cc: Kathy Mitchell; Mention Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
@luqmaanhttps://github.com/luqmaan, I agree if we can get away with not setting up a database, we should. But do you think that kind of search functionality fulfills the type of searching @safeandfree< https://github.com/safeandfree> is looking for? For instance, using your construction permit example, I can search for how many Sign Permits have been issued (5,866< https://github.com/open-austin/construction-permits/search?utf8=%E2%9C%93&q=Sign+Permit&type=Code>), but I can't figure out how many Sign Permits were issued for Spicewood Springs< https://github.com/open-austin/construction-permits/search?utf8=%E2%9C%93&q=Sign+Permit+Spicewood+Springs&type=Code> using the github search alone. I also can't figure out how many Sign Permits have been issued since 2000 using the github search alone. I can't compare the amount of Sign Permits in 2000 to 1999.
To figure those out, I would have to download all the csvs and load them into my own database. So my question goes to @safeandfree< https://github.com/safeandfree>, is that a valid solution? Could we set something up to start downloading the jims1058.txt files and storing them here? And from there you can load them into whichever personal DB you decide is necessary. In many ways, that leaves you in a similar position you are in now, except now you would have some historical data, as opposed to just one. You also wouldn't have to update you DB everyday. Since the jims1058.txt file is being stored here everyday, you can feel sure that you're not missing a day and just update your personal DB whenever you decide you want new data. (Maybe we can include a script or two to make that easier.)
You know, as I type that out, that solution is growing on me. It does put some of the burden on the person looking to make in-depth conclusions based on the data - i.e. @safeandfreehttps://github.com/safeandfree.
As for the privacy question, I think you guys would be better equipped to answer that than me. But I would point out that, using on that construction permits repo, I was able to surmise that two people named Larry Butler and Carol Ann Sayle remodeled a home on Lyons road in 1980< https://github.com/open-austin/construction-permits/blob/master/data/1980/1980-01-02.csv>. So I think you guys have already staked out a position somewhere on the privacy spectrum. This would just seem to follow that position.
You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/open-austin/project-ideas/issues/73#issuecomment-234414750>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ARrnX7OZrpBxwD4M1f1lAUo5zzjBxaMWks5qYAESgaJpZM4JPQ9o
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/open-austin/project-ideas/issues/73#issuecomment-234429631, or mute the thread https://github.com/notifications/unsubscribe-auth/AQ7i2K5BoA4ywgPAdy1pPvT5nfByZiiGks5qYBmPgaJpZM4JPQ9o .
Legally, this is completely public information. That maybe needs to change, and there will be some discussion during the next legislative session about the privacy rights of people who have been arrested, booked and charged with crimes but are not yet “guilty” because they are pre-trial. For now, there is a significant research benefit to making this data available to the criminal justice reform movement so we can study things like arrests for offenses where jail time is not a punishment, or arrests for offenses like “evading arrest” which make no sense as stand alone charges, or arrests for low level drug possession offenses (which are based on unreliable field tests.) And there is a significant organizing component for the movement as well. I am actually contacting people so they can vouch for their experience in the political process.
Hopefully, all that helps get us to the best technical solution?
@colbywhite, Is it cool if I go ahead and create a repo for this under our github org, "open-austin" and set you up as admin?
What should we name the repo?
open-austin/jims
Or something more descriptive?
open-austin/harris-county-bookings
open-austin/harris-county-bookings
is good.
FYI: A quick, casual, less-than-five-minutes Google search indicates that there may be some other counties using JIMS (Knox, Tennessee seems to use something like it), but for now, we're focused on Harris County. If someone from another county/city wants our help with it, then we can make a new repo with the common JIMS code. But open-austin/harris-county-bookings
is definitely good for now.
Should we consider modifying some of the columns that personally identify people?
Perhaps just for this repo, leave off the unnecessary columns for development, then add them back for production (safeandfree's computer). Just so that we don't improve the SEO of someone's records.
I have been scraping this data since May 2015. Happy to make it available, provided there are privacy safeguards for the names.
Nice @fileunderjeff!
Do you mind turning your code and data into a repo? Or opening a PR to https://github.com/open-austin/harris-county-bookings? Whichever one works best for you.
@luqmaan no problem! Let me confer with some local attorneys first, but I am happy to put together the database. Right now, my scraper is pretty rudimentary. I'd like to work on it a little more, and maybe build an API. Stay tuned!
Excellent.
Before you do a bunch of work to design and build an API, lets make things simple. Just a bunch of CSV files in a github repo.
CSV files in a repo have a bunch of advantages over an API, specifically:
@luqmaan I am with you on the ease of a repo, but I am not going to release the raw files without consulting a lawyer first. This is something we are already working on for 2 other projects in Houston.
WOW!! Can you give me a call?
Kathy Mitchell Texas Criminal Justice Coalition | Grassroots Sentencing Campaign Coordinator 1714 Fortview Road, Suite 104 Austin, Texas 78704 Office: (512) 441-8123 Ext. 116 Fax: (512) 441-4884 www.TexasCJC.orghttp://www.texascjc.org/ | www.facebook.com/TexasCJCx-msg://4/www.facebook.com/TexasCJC | www.twitter.com/TexasCJCx-msg://4/www.twitter.com/TexasCJC
TCJC works with peers, policy-makers, practitioners, and community members to identify and promote smart justice policies that safely reduce Texas’ costly over-reliance on incarceration – creating stronger families, less taxpayer waste, and safer communities. DONATE TODAY!https://co.clickandpledge.com/sp/d1/default.aspx?wid=45713
From: Jeff Reichman [mailto:notifications@github.com] Sent: Friday, July 22, 2016 10:43 PM To: open-austin/project-ideas Cc: Kathy Mitchell; Mention Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
I have been scraping this data since May 2015. Happy to make it available, provided there are privacy safeguards for the names.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-austin/project-ideas/issues/73#issuecomment-234696966, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARrnX1zUNkNckwoUqX7WlmiFjC3GH2_Pks5qYY3QgaJpZM4JPQ9o.
I was able to carve away some time to do the initial cut. Will work on committing the file next. Shall we close this ticket for now and move the convo over?
As far as the privacy goes, I'm just going to follow in the permit repo's footsteps until a decision is made on how Open Austin wants to handle privacy in these sorts of situations. But I am eager to hear what y'all come up with. (And maybe that convo should be split into a different ticket as well?)
@colbywhite No, we should keep the issue open. I think there's still some discussion going on.
Also, we still need to figure out if @fileunderjeff is opening a PR to add the data he's already collected to https://github.com/open-austin/harris-county-bookings or if he'll be creating his own repo.
@colbywhite @luqmaan i'll be creating a repo out of Sketch City, but only after I talk to a lawyer. I urge you all to consider the privacy issues at stake here. Arrests are not adjudications. They can be expunged, found not guilty, etc. The JIMS file also has a ton of personally identifying information that needs to be reviewed by a lawyer prior to publishing. So it is not happening overnight. Thank you for your patience.
@luqmaan also happy to initiate a PR from Open Austin. Doesn't matter to me.
Other things we are planning to do with this data (in case anyone wants to join in!):
This data is really interesting and there's a lot that can be done with it.
@fileunderjeff, Have you used data.world yet? We have invites we can share. That might be a good place to host the data once personal identifiers are removed.
You can host a dataset for free Public or Private, just like Github. You can control who has access to view, query, and download the data. Maybe that's a solution to the sharing question. They don't have a public API so uploading would be a manual process, but it sounds like the data you have already collected would give Kathy a start on research.
I take the privacy question seriously.
A situation I want to avoid is one where its harder for someone to get a job because:
Each row in this DB represents a fragile moment in time when a person lost control of their liberty to the State before being able to defend their innocence. For good reasons, this data is already public. For bad reasons, it already perpetuates inequalities in our justice system.
do no harm...
FYI: The work in open-austin/harris-county-bookings is ready to be deployed. Then this could be closed, correct?
Hey all...I have somehow missed the updates here. Is this the work that Jeff in Houston has been doing or is this the work we started (and completed?) here at Open Austin? I got a little confused about who was on first. [😊]
From: Colby M. White notifications@github.com Sent: Wednesday, August 17, 2016 9:52:54 PM To: open-austin/project-ideas Cc: Kathy Mitchell; Mention Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
FYI: The work in open-austin/harris-county-bookingshttps://github.com/open-austin/harris-county-bookings is ready to be deployed. Then this could be closed, correct?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-austin/project-ideas/issues/73#issuecomment-240610186, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARrnX6slDfT04MiA96ANwFoJrH-fpfm9ks5qg8kGgaJpZM4JPQ9o.
@safeandfree I've been completed the work from a Open Austin perspective for a while. Just waiting on some credential information so I can deploy it and start it up. You'll be able to see the data in the open-austin/harris-county-bookings repo's data directory when it's running.
And it sounded like @fileunderjeff will be doing his work under the @sketch-city group, but getting a lawyer's opinion first. The work I did scrubs the personal data out of what is kept.
I haven't looked at this project before and I don't really know how it works, but it doesn't look to me like it's handling privacy correctly yet. The file names with the .accdb extension still have the arrestees' names. I can understand collecting that data, but I don't think Github is the right place to store it. The .csv files have no names, but they still have addresses, which are also personally identifying. It looks to me like these are home addresses, not the address where the arrest happened (the arrestee I'm looking at was probably not inside an apartment when she got caught for driving without a license, yet the address field for the arrest specifies a unit of an apartment building).
My suggestion is to not store the .accdb files on Github at all, and drop at least these additional columns from the .csv files: ADDRESS NUMBER, ADDRESS PREFIX, ADDRESS STREET, ADDRESS SUFFIX, ADDRESS ALI. You might think about generating arbitrary ID numbers corresponding to the arrested person, or some other solution to make it clear whether a bunch of people are being arrested or one person is being arrested for numerous crimes. But what's important in the short term is to fix the privacy issue.
I think what you're seeing regarding the difference between the accdb
and the csv
is a bug. I'll investigate that further during the hack night this week. Not sure why those are being treated differently. Good catch.
In regards to the addresses, those just weren't on the list of things to scrub in open-austin/harris-county-bookings#4. But you're probably right. Those should probably be scrubbed as well. I like the arbitrary ID number idea as well.
Thinking back to the problem that @safeandfree is trying to solve...
A primary purpose for me is to be able to surface patterns and ID people who are the victim of patterns of over policing. Yesterday, I took several days of the data, merged it into a single data file, and then ran cross tabs in order to ID people who were arrested on a single class C misdemeanor charge so I could write them a letter asking for more information about what happened and why they were actually jailed on an offense that does not have jail as an available punishment (Class C misdemeanors are "fine-only" offenses).
I think getting @safeandfree set up with the full unabridged dataset to use either locally on her computer, or a privately accessible server (like the Azure SQL db) is what is needed.
@safeandfree will you be at the Civic Hack Night tomorrow? Just want to make sure what @colbywhite has been working on gets you a solution that works for what you are trying to accomplish. And thanks to @mscarey for speaking up about privacy concerns. The more we scrub what is published on Github, the more I think we might need to consider a seperate solution to address @safeandfree's needs.
Mateo
Yes Mateo is right about what I really need from this.
Yes, I will be at Civic Hack tomorrow evening. Sorry to have been a bit out of pocket. SO much going on.
Kathy Mitchell Texas Criminal Justice Coalition | Grassroots Sentencing Campaign Coordinator 1714 Fortview Road, Suite 104 Austin, Texas 78704 Office: (512) 441-8123 Ext. 116 Fax: (512) 441-4884 www.TexasCJC.orghttp://www.texascjc.org/ | www.facebook.com/TexasCJCx-msg://4/www.facebook.com/TexasCJC | www.twitter.com/TexasCJCx-msg://4/www.twitter.com/TexasCJC
TCJC works with peers, policy-makers, practitioners, and community members to identify and promote smart justice policies that safely reduce Texas’ costly over-reliance on incarceration – creating stronger families, less taxpayer waste, and safer communities. DONATE TODAY!https://co.clickandpledge.com/sp/d1/default.aspx?wid=45713
From: Mateo Clarke [mailto:notifications@github.com] Sent: Monday, August 22, 2016 11:03 AM To: open-austin/project-ideas Cc: Kathy Mitchell; Mention Subject: Re: [open-austin/project-ideas] Turn daily arrest data posted by Harris County into a real database (#73)
Thinking back to the problem that @safeandfreehttps://github.com/safeandfree is trying to solve...
A primary purpose for me is to be able to surface patterns and ID people who are the victim of patterns of over policing. Yesterday, I took several days of the data, merged it into a single data file, and then ran cross tabs in order to ID people who were arrested on a single class C misdemeanor charge so I could write them a letter asking for more information about what happened and why they were actually jailed on an offense that does not have jail as an available punishment (Class C misdemeanors are "fine-only" offenses).
I think getting @safeandfreehttps://github.com/safeandfree set up with the full unabridged dataset to use either locally on her computer, or a privately accessible server (like the Azure SQL db) is what is needed.
@safeandfreehttps://github.com/safeandfree will you be at the Civic Hack Night tomorrow? Just want to make sure what @colbywhitehttps://github.com/colbywhite has been working on gets you a solution that works for what you are trying to accomplish. And thanks to @mscareyhttps://github.com/mscarey for speaking up about privacy concerns. The more we scrub what is published on Github, the more I think we might need to consider a seperate solution to address @safeandfreehttps://github.com/safeandfree's needs.
Mateo
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/open-austin/project-ideas/issues/73#issuecomment-241461827, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ARrnXxzOTjVhp-bjsB_Os_acR82O5xoBks5qichFgaJpZM4JPQ9o.
Useful background about PII: https://en.wikipedia.org/wiki/Personally_identifiable_information
I'm not an expert on PII (though I've worked on databases for 30 years), so I'm not sure if arrest records are exempted.
I will say that simply hiding the fact that the county is releasing all of this information is a big dis-service to the community. While this project doesn't necessarily have to expose the same data that the county does, I think it should let visitors know what information the county provides that is not being exposed. People have a right to know what information is being made public.
Please understand I don't have an axe to grind with Harris county. They should be following appropriate laws about release of information (which hopefully they are). In either case, part of open government should be making it clear to citizens what data is out there.
The repo for this project idea is here
I think this is easy, but it is over my head. :)
There is a daily data file posted in Houston here http://www.jims.hctx.net/jimshome/jimsreports/jims1058.txt
This is 24 hours of arrests. The file is replaced every day. We are trying to show that Harris County's jail is full of people they didn't have to arrest.
We pulled it and yesterday in Houston: · A guy was arrested for a bad headlight, no other charge · A guy was arrested for a stop sign violation, no other charge · Eleven people were arrested for poss less than a gram, pg 1 and no other offense (keeping in mind, 1/3rd or more of these are likely to be bad field test victims) · Two people were jailed for theft less than $50 · A woman jailed for a series of things that just look like “piling on” – unclean license plate (that’s a thing?), cardboard over a car window, no insurance · Two guys were arrested for evading with no additional charge · A guy was arrested for evading with a vehicle with no other charge
The list goes on.
And now for the help I need.
I want to automate a daily download of this dataset. I can create an automated task in Access that will grab a file from my desktop and load it to Access.
That's it. I just want to grab this file every day and not have to do it manually. I hope that's pretty simple. I would suggest including it in your data portal project but its Harris County data. So not local. It is really valuable for criminal justice research.
Links to any research/data available/articles
Links here.
What are the next steps (validation, research, coding, design)?
Answer here.
What help is needed at this time?
Answer here.