metmuseum / openaccess

The Metropolitan Museum of Art's Open Access Initiative
Creative Commons Zero v1.0 Universal
1.16k stars 207 forks source link

this is improper CSV: odd number of quotes #10

Closed VladimirAlexiev closed 7 years ago

VladimirAlexiev commented 7 years ago
perl -ne '@matches = m{(")}g; $matches=@matches; print "$matches quotes, line $.: $_" if $matches%2' \
   MetObjects.csv > bad-quotes.txt

156438 of 554403 rows have an odd number of quotes. This makes it broken CSV and nearly impossible to work with

VladimirAlexiev commented 7 years ago

Maybe I spoke too quickly: there are newlines in some fields enclosed in quotes. Let me see with csvkit...

VladimirAlexiev commented 7 years ago

Yes, close this