Space character after `,` between column contents creates a column?

ssteinerx commented 7 years ago

This is extracted from a larger table so please ignore the row designation within the content.

Also, I couldn't get backtick quoted blocks containing backticks to work on GitHub, so they are missing from the table declarations below though they are properly present in the source document.

The first table behaves as expected, the second gives:

pantable: table rows are of irregular length. Empty cells appended.

The difference, which is almost impossible to see, is that there is a space character following the comma between columns one and two. The single space character is treated as it own column with the following quoted text becoming column one of a new row. I would think the space between the comma and the opening quote would be silently swallowed by the CSV parser.

table
---
caption: '__Not Broken, No Space After Comma__'
alignment: RRR
table-width: 2/3
markdown: True
---
*First row*,__defaulted to be header row__, __*can be disabled*__
"Row-3-Col-1-Arbitrary block element:

- following standard markdown syntax
- like this
- Row-3-Col-1-END.","Row-3-Col-2-Another Arbitrary block element:

    1. Number 1 -- Row-3-Col-2
    2. Number 2 -- Row-3-Col-2
        - Mixed #1 Row-3-Col-2
        - Mixed #2 Row-3-Col-2","Row-3-Col-3 Nothing Fancy"

table
---
caption: '__Broken, Space After Comma__'
alignment: RRR
table-width: 2/3
markdown: True
---
*First row*,__defaulted to be header row__, __*can be disabled*__
"Row-3-Col-1-Arbitrary block element:

- following standard markdown syntax
- like this
- Row-3-Col-1-END.", "<-- *The space is to the left of that quote!* Arbitrary block element:
    1. Number 1 -- Row-3-Col-2
    2. Number 2 -- Row-3-Col-2
        - Mixed #1 Row-3-Col-2
        - Mixed #2 Row-3-Col-2","Row-3-Col-3 Nothing Fancy"

ickc commented 7 years ago

Have you tried reading the csv by Python's CSV reader? Since this filter relies on Python's CSV reader to parse the CSV table, if the reader there can't understand your table, then there's nothing we can do about it here.

ssteinerx commented 7 years ago

I went back to the Python docs and apparently it's a matter of setting the following "Dialect" variable:

https://docs.python.org/2/library/csv.html#csv.Dialect.skipinitialspace

Dialect.skipinitialspace

When True, whitespace immediately following the delimiter is ignored. The default is False.

I can't say I've ever used it, but it seems to pretty clearly fit this...

ickc commented 7 years ago

@ssteinerx, if you're sure this is the solution and want to make a pull request, you could make this change and add a MWE in the corresponding test. Otherwise I'll probably do this after I finish my vacation in mid June.

ickc commented 7 years ago

@ssteinerx, on 2nd thought, changing the dialect may not be a good default. So may be a bigger change is needed, e.g. to provide a choice of CSV dialect in the YAML metadata of the csv code block.

Edit: or even better, to implement a general mapping from a YAML metadata with key csv to all the csv options available from the CSV module. e.g. some people may want to also use a different delimiter, say, tab rather than comma.

ssteinerx commented 7 years ago

@ickc I might look at adding a simple option to allow whitespace, but supporting other CSV dialects etc. doesn't seem to have been actually needed by any of the tools like this that I've reviewed.

ickc commented 7 years ago

@ssteinerx, I'm considering using other CSV reader/writer, such as that from Pandas. It would probably have a different behavior that you might or might not like. Let me know if this would a problem with you.

By the way, I try to load your 2nd CSV into Python3 and I can't reproduce your bug. Were you using Python 2?

ickc commented 7 years ago

See #21

ssteinerx commented 7 years ago

@ickc Sorry, I don't remember which Python it was, but I'll be going back to the project in which this happened shortly. If you're changing CSV Parsers, I'll try it then -- I left notes for myself...

ickc commented 7 years ago

@ssteinerx, please check pantable v0.11 in #25 fixes your problem. Thanks.

ickc / pantable

Space character after `,` between column contents creates a column? #11