Frequently, the CSV published by governments is invalid. Errors are common, some either generated by or tolerated by Microsoft Excel, which is liberal in what it accepts and not particularly conservative in what it emits. The result of this is data that appears by its creator to be valid, but that is a mess for users to parse.
The solution is probably to bake a validation step into the data release process. If CKAN, DKAN, Socrata, and Junar don't already, they should (optionally) use csvclean or csvlint to validate CSV at the time that it's uploaded.
This was an issue raised by a SRCCON attendee in July, at a discussion about open data needs.
Frequently, the CSV published by governments is invalid. Errors are common, some either generated by or tolerated by Microsoft Excel, which is liberal in what it accepts and not particularly conservative in what it emits. The result of this is data that appears by its creator to be valid, but that is a mess for users to parse.
The solution is probably to bake a validation step into the data release process. If CKAN, DKAN, Socrata, and Junar don't already, they should (optionally) use csvclean or csvlint to validate CSV at the time that it's uploaded.
This was an issue raised by a SRCCON attendee in July, at a discussion about open data needs.