Closed ssteinerx closed 7 years ago
Have you tried reading the csv by Python's CSV reader? Since this filter relies on Python's CSV reader to parse the CSV table, if the reader there can't understand your table, then there's nothing we can do about it here.
I went back to the Python docs and apparently it's a matter of setting the following "Dialect" variable:
https://docs.python.org/2/library/csv.html#csv.Dialect.skipinitialspace
Dialect.skipinitialspace
When True, whitespace immediately following the delimiter is ignored. The default is False.
I can't say I've ever used it, but it seems to pretty clearly fit this...
@ssteinerx, if you're sure this is the solution and want to make a pull request, you could make this change and add a MWE in the corresponding test. Otherwise I'll probably do this after I finish my vacation in mid June.
@ssteinerx, on 2nd thought, changing the dialect may not be a good default. So may be a bigger change is needed, e.g. to provide a choice of CSV dialect in the YAML metadata of the csv code block.
Edit: or even better, to implement a general mapping from a YAML metadata with key csv to all the csv options available from the CSV module. e.g. some people may want to also use a different delimiter, say, tab rather than comma.
@ickc I might look at adding a simple option to allow whitespace, but supporting other CSV dialects etc. doesn't seem to have been actually needed by any of the tools like this that I've reviewed.
@ssteinerx, I'm considering using other CSV reader/writer, such as that from Pandas. It would probably have a different behavior that you might or might not like. Let me know if this would a problem with you.
By the way, I try to load your 2nd CSV into Python3 and I can't reproduce your bug. Were you using Python 2?
See #21
@ickc Sorry, I don't remember which Python it was, but I'll be going back to the project in which this happened shortly. If you're changing CSV Parsers, I'll try it then -- I left notes for myself...
@ssteinerx, please check pantable v0.11 in #25 fixes your problem. Thanks.
This is extracted from a larger table so please ignore the row designation within the content.
Also, I couldn't get backtick quoted blocks containing backticks to work on GitHub, so they are missing from the
table
declarations below though they are properly present in the source document.The first table behaves as expected, the second gives:
The difference, which is almost impossible to see, is that there is a space character following the comma between columns one and two. The single space character is treated as it own column with the following quoted text becoming column one of a new row. I would think the space between the comma and the opening quote would be silently swallowed by the CSV parser.