Closed rcackerman closed 8 years ago
We overwrite based off of both DIN
and interview date, so we shouldn't overwrite prisoners. While if the data changed for a row with identical DIN
and interview date we would overwrite it, that would mean that the state actually changed their data on an interview -- and we should track that change because data.csv
is in version control.
Scheduled hearings are overwritten after they occur, with their date changed from YYYY-MM-*
to YYYY-MM-DD
as we don't know the precise day til after the fact (or the hearing decision, etc.) Overwriting these should be OK, plus we should still keep the history thanks to git.
Sorry, I never got back to you on this.
I'm ok with overwriting data, but I do want to log when things change*, since a) I believe many of the users of this data are not going to be familiar with version control and b) version control is not a great way of determining how things change.
* Any inmate information, and any information about the hearing except a change from * to an interview date or a ***\ to a decision.
I think the simplest implementation of a change tracker would be git-based. A script that combed through the history for data.csv
and outputted all the changed rows, with perhaps some indication of how it changed.
Such a script could be written in Python without too much difficulty, and depend upon the client having git.
Alternatively, the scheduled process of updates could run this, and commit it to version control.
I'm not actually sure that the state does much in the way of any changes. We'll see once the script starts getting run regularly (which I can do, just got sidelined with other projects.) I still think the new system is an improvement, as previously the scraper wasn't checking for months that had already been checked -- so it never would've been possible to see if there were after-the-fact changes.
Totally an improvement - thanks!
It seems like a small function within that last check would work fine - just spit out any records that have changed.
Closing to make way for version2.
When people from previous scrapes are found in the current scrape, we update their data. This might be a problem if we want to preserve the original state to see changes over time.
Potentially not a problem... wait to see.