Closed Anaphory closed 4 years ago
I recently came across a similar problem with the csv sniffer. It's really annoying how difficult cross-platform csv manipulation seems to be.
This was also affected by #215 so I pushed the same fix that helped there.
The latest commit fails in Python2.7, because the dialect
(Pdb) vars(self.kw['dialect'])
{'__module__': 'csv', 'lineterminator': '\r\n', 'skipinitialspace': False, 'quoting': 0, 'encoding': 'ascii', '_name': 'sniffed', 'delimiter': u',', 'quotechar': u'"', '__doc__': None, 'doublequote': False}
passed to csv.reader
– python2.7/site-packages/csvw/dsv.py
:
224 -> self.reader = csv.reader(self.f, **self.kw)
contains unicode objects for delimiter and quotechar, and that is apparently not allowed. I don't know how to fix that.
csvw.dsv handles that somewhat. But I'd think at this point, dropping py2 may be more reasonable.
Am Fr., 23. Nov. 2018, 15:15 hat Gereon Kaiping notifications@github.com geschrieben:
The latest commit fails in Python2.7, because the dialect
(Pdb) vars(self.kw['dialect']) {'module': 'csv', 'lineterminator': '\r\n', 'skipinitialspace': False, 'quoting': 0, 'encoding': 'ascii', '_name': 'sniffed', 'delimiter': u',', 'quotechar': u'"', 'doc': None, 'doublequote': False}
passed to csv.reader – python2.7/site-packages/csvw/dsv.py:
224 -> self.reader = csv.reader(self.f, **self.kw)
contains unicode objects for delimiter and quotechar, and that is apparently not allowed. I don't know how to fix that.
— You are receiving this because your review was requested. Reply to this email directly, view it on GitHub https://github.com/lmaurits/BEASTling/pull/228#issuecomment-441251487, or mute the thread https://github.com/notifications/unsubscribe-auth/AA1HKHIKW2iV3nX47DHLfAm2bqJbAKo_ks5uyAMUgaJpZM4Ywk-e .
The dialect object we get back from the sniffer now has an encoding attribute, thanks to Gereon's recent commit, right? Can't we just use that to encode the delimiter and quote char before handing it to the reader? Or is the dialect immutable?
I think we don't actually need the encoding to be used - the relevant entries are supposed to be ASCII encoded strings anyway. But yes, the dialect object would be the place to encode these things for PY2
.
Or is the dialect immutable?
If the dialect was immutable, I could not actually stuff an encoding
into it :)
It should really just be converting some u','
to ','
in Py2 or something very similar, but I was not sure yet why it happened in the first place and where the best place to fix in in the most canonical way would be.
AFAICT this PR can be closed. @Anaphory agree?
Fixes #227