rcackerman / parole-hearing-data

http://www.parolehearingdata.org/
21 stars 12 forks source link

Min/Max sentences have erroneous months #29

Closed talos closed 9 years ago

talos commented 9 years ago

Some errors crept into the existing data some time back. Min/max sentences, which are in the format "NN-NN", had some of the numbers converted to months. For example, a minimum sentence of "01-00" was converted to "Jan-00", or a maximum sentence of "07-00" was converted to "Jul-00".

An example of this bug can be seen for this inmate:

http://161.11.133.89/ParoleBoardCalendar/details.asp?nysid=00005937Q

Aggregated minimum sentence is "12-00".

In data.csv, commit ac52c4aa507b7030dade8b567a76b68f78357f33, the first line, the aggregated minimum sentence has been converted to "Dec-00".

@rcackerman I'm working on a fix for this now.

talos commented 9 years ago

As a note, this seems to have been a pre-existing condition of the data -- the changes I've made didn't cause this error.

talos commented 9 years ago

Hrm, this can get pretty ugly... for example

http://161.11.133.89/ParoleBoardCalendar/details.asp?nysid=03499002J

In the data, the min has been flipped from "01-08" to "8-Jan", so sometimes the numbers need to be flipped around, too...

rcackerman commented 9 years ago

@talos I bet this is an excel issue.

I'm seeing this as a way of preventing Excel from eating numbers: http://superuser.com/questions/330291/how-to-stop-excel-from-auto-formatting-and-making-it-work-like-a-number-crunchin, but that seems messy.

talos commented 9 years ago

Interesting. So at some point numbers run through Excel were re-committed to the mainline.

I wrote some code that restores these to the correct format, and in the future it should be easy to stop a commit like that from happening again (as Excel wouldn't be part of the pipeline.)