rcackerman / parole-hearing-data

http://www.parolehearingdata.org/
21 stars 12 forks source link

Log changes to parolee records #28

Closed rcackerman closed 8 years ago

rcackerman commented 9 years ago

When people from previous scrapes are found in the current scrape, we update their data. This might be a problem if we want to preserve the original state to see changes over time.

Potentially not a problem... wait to see.

talos commented 9 years ago

We overwrite based off of both DIN and interview date, so we shouldn't overwrite prisoners. While if the data changed for a row with identical DIN and interview date we would overwrite it, that would mean that the state actually changed their data on an interview -- and we should track that change because data.csv is in version control.

Scheduled hearings are overwritten after they occur, with their date changed from YYYY-MM-* to YYYY-MM-DD as we don't know the precise day til after the fact (or the hearing decision, etc.) Overwriting these should be OK, plus we should still keep the history thanks to git.

rcackerman commented 9 years ago

Sorry, I never got back to you on this.

I'm ok with overwriting data, but I do want to log when things change*, since a) I believe many of the users of this data are not going to be familiar with version control and b) version control is not a great way of determining how things change.

* Any inmate information, and any information about the hearing except a change from * to an interview date or a ***\ to a decision.

talos commented 9 years ago

I think the simplest implementation of a change tracker would be git-based. A script that combed through the history for data.csv and outputted all the changed rows, with perhaps some indication of how it changed.

Such a script could be written in Python without too much difficulty, and depend upon the client having git.

Alternatively, the scheduled process of updates could run this, and commit it to version control.

I'm not actually sure that the state does much in the way of any changes. We'll see once the script starts getting run regularly (which I can do, just got sidelined with other projects.) I still think the new system is an improvement, as previously the scraper wasn't checking for months that had already been checked -- so it never would've been possible to see if there were after-the-fact changes.

rcackerman commented 9 years ago

Totally an improvement - thanks!

It seems like a small function within that last check would work fine - just spit out any records that have changed.

rcackerman commented 8 years ago

Closing to make way for version2.