unitedstates / inspectors-general

Collecting reports from Inspectors General across the US federal government.
https://sunlightfoundation.com/blog/2014/11/07/opengov-voices-opening-up-government-reports-through-teamwork-and-open-data/
Creative Commons Zero v1.0 Universal
107 stars 21 forks source link

Make report_id Windows-safe, and validate #166

Closed divergentdave closed 10 years ago

divergentdave commented 10 years ago

I've been cleaning these up as I go along, but we should make sure that all the scrapers are generating report_ids that are valid filenames on Windows, and then add checks to enforce that in validate_report(). In particular, we should check for these characters: \ / : * ? " < > | (and newline, and carriage return) and make sure that the length isn't too long.

konklone commented 10 years ago

Good call - there is a section to do that, but it only checks for /. Updating it for the remaining characters is a good idea.

Alternatively, the report_id field could be pre-processed so that any of the invalid characters are auto-replaced with -. This runs the risk of collisions, though.

divergentdave commented 10 years ago

Here's a list of problem path names I ran into on a recent test run.

konklone commented 10 years ago

I see all the checkboxes are checked - is this good to review/merge?

divergentdave commented 10 years ago

I have one more commit to add with the validation end of this

konklone commented 10 years ago

Nice! :+1: