hurlbertlab / dietdatabase

Creative Commons Zero v1.0 Universal
10 stars 9 forks source link

double quotes in tsv files are not delimiters #43

Closed jhpoelen closed 7 years ago

jhpoelen commented 7 years ago

Please note that in tab separated files, the " is not a delimiter, but treated as a character instead. See http://www.iana.org/assignments/media-types/text/tab-separated-values and https://en.wikipedia.org/wiki/Tab-separated_values for more information .

As far as I know, reason for widespread adoption of tsv file format, is so that you do not have to bother using any delimiters.

For some reason, all fields in avianDietDatabase.txt are now quoted. Suggest to either switch to csv or remove the " characters. I found this issue because I noticed that the GloBI check failed: https://travis-ci.org/hurlbertlab/dietdatabase/builds/197039883 , where it failed to find a Source column header among headers that included "Source" instead.

Hope this helps.

ahhurlbert commented 7 years ago

Not sure what was wrong. I do not see field names surrounded by quotes and the build seems to be passing at the moment.

jhpoelen commented 7 years ago

Interesting. . . it might be a save preference for excel, something like: "always quote strings", that is causing the insertions of the quotes. Thanks for having a look, build look green for now.