w3c / csvw

Documents produced by the CSV on the Web Working Group
Other
161 stars 57 forks source link

Why is the `escape_character` flag determined by the `doubleQuote` property of a dialect? #852

Open xrotwang opened 7 years ago

xrotwang commented 7 years ago

I'm trying to convert dialect descriptions to the equivalent set of formatting parameters in python. It appears that this is not possible, because naively translating a "doubleQuote": true to doublequote=True, ecapechar='"' in python will fail as exemplified with this CSV:

COL1,COL2
"quoted, "" content",val2

and the snippet of python 3 below. Note that setting escapechar to '\\' indepent of doublequote succeeds.

import csv

csv.field_size_limit(20)  # this limits the maximal field length to just a bit above what would be required in our example

with open('escape_char.csv') as fp:
    print(list(csv.reader(fp, escapechar='\\', doublequote=True))[:2])

with open('escape_char.csv') as fp:
    print(list(csv.reader(fp, escapechar='"', doublequote=True))[:2])

The python code succeeds for the first attempt at reading and fails for the second:

$ python3 escape_char.py 
[['COL1', 'COL2'], ['quoted, " content', 'val2']]
Traceback (most recent call last):
  File "escape_char.py", line 9, in <module>
    print(list(csv.reader(fp, escapechar='"', doublequote=True))[:2])
_csv.Error: field larger than field limit (20)