Open EvaRuthM opened 1 year ago
By "CDR dataset" do you mean the original version in google docs, datadotworld, or just the initial reports? I can look into it but I am not 100% sure how this part of the website works so may take a bit.
I mean the original version in google docs (and the initial reports). Thanks Elaine -- I wasn't sure if you'd ever worked on that part. I can also ask Jason when he gets back if we need.
OK I (think) I just need to figure out where it is getting filtered out. It looks like the raw data comes from s3, so I can check there first to see if it is a data pipeline or a website thing; if it's the website we may need to wait for Jason.
copy that. thanks for taking a look!
OK so I think it gets removed here: https://github.com/texas-justice-initiative/data-processing/tree/dfcbdb8d8e4957cc27ebe027f50ddcf0c0dd82ef
I need to read through it because there's some things in there I don't understand/don't want to mess up the whole pipeline; I can have a look this weekend.
this would require us to clean this column of data and we have not done that at this point.
in our CDR data set, column AU contains data on "TDCJ - Specify Unit" yet this data column does not show up when data is downloaded. please add the data column to the downloaded data output for Deaths in Custody