qiime2 / q2-metadata

BSD 3-Clause "New" or "Revised" License
3 stars 17 forks source link

`tabulate` breaks when metadata contains escaped newline, tab, or quote characters #21

Closed jairideout closed 6 years ago

jairideout commented 6 years ago

If a metadata file contains any of the following characters that are escaped with double quotes following Excel's TSV rules, metadata tabulate encounters a JSON SyntaxError:

Here's the output that's produced with any of these characters in the file:

image

It looks like DataTables is having trouble parsing these escaped cells. Is it possible to configure DataTables to match Excel's TSV rules (e.g. similar to Python's csv module excel-tab dialect)?

thermokarst commented 6 years ago

@jairideout, can you provide a sample metadata file that recreates all of these error cases listed above? I prepared a metadata file (see below), but wasn't able to recreate the error message you reported above in the screenshot. In particular, only the last row, when uncommented, raises an error, although it is different than what you reported (SyntaxError: JSON.parse: expected ',' or ']' after array element at line 1 column 173 of the JSON data). I checked this in FF and Chrome, FWIW. I probably just put together my test file incorrectly, so a sample from you would be super helpful. Thanks!

#SampleID   foo bar
s1  a\nb    "a\nb"
s2  c\rd    "c\rd"
s3  e\tf    "e\tf"
#s4 g""h    "g""h"
screen shot 2018-02-03 at 10 00 02 pm

The parsed JSON:

{"columns":[["#SampleID",""],["foo","categorical"],["bar","categorical"]],"index":[0,1,2],"data":[["s1","a\\nb","a\\nb"],["s2","c\\rd","c\\rd"],["s3","e\\tf","e\\tf"]]}
jairideout commented 6 years ago

Thanks for looking into this @thermokarst! Here are four minimal files that reproduce the JSON SyntaxError, tested with latest qiime2 development environment on Ubuntu with Chrome Version 63.0.3239.132 (Official Build) (64-bit).

In particular, only the last row, when uncommented, raises an error, although it is different than what you reported (SyntaxError: JSON.parse: expected ',' or ']' after array element at line 1 column 173 of the JSON data).

Each file will have a slightly different JSON SyntaxError depending on the data being loaded, the offending character, and where the character appears in the file. I should have made that clearer in my original message. All of the following data files produce JSON SyntaxError in my testing environment.

File with a newline character in a cell:

newline.txt

File with a carriage return in a cell:

carriage-return.txt

File with a tab character in a cell:

tab.txt

File with a double-quote character in a cell:

double-quote.txt

Let me know if those files help with reproducing the errors. I find it helpful to use Metadata.load() in an IPython session to inspect how the Metadata reader is loading the file contents, and compare that to tabulate's JSON -> DataTables conversions.