opendata / Open-Data-Needs

An ongoing effort to catalog the holes in the open data ecosystem. [RETIRED]
15 stars 0 forks source link

Invalid CSV is rampant #11

Closed waldoj closed 10 years ago

waldoj commented 10 years ago

Frequently, the CSV published by governments is invalid. Errors are common, some either generated by or tolerated by Microsoft Excel, which is liberal in what it accepts and not particularly conservative in what it emits. The result of this is data that appears by its creator to be valid, but that is a mess for users to parse.

The solution is probably to bake a validation step into the data release process. If CKAN, DKAN, Socrata, and Junar don't already, they should (optionally) use csvclean or csvlint to validate CSV at the time that it's uploaded.

This was an issue raised by a SRCCON attendee in July, at a discussion about open data needs.

waldoj commented 10 years ago

Closing as a duplicate of #7.