janisdd / vscode-edit-csv

vs code extension to edit csv files with an excel like table ui
MIT License
211 stars 30 forks source link

Quote character " ignored #124

Open IAmWhitBran opened 1 year ago

IAmWhitBran commented 1 year ago

Originally posted by @PixelKnot in https://github.com/janisdd/vscode-edit-csv/issues/58#issuecomment-1607777523 Added this as a comment to a closed issue, then figured it could probably do with being it's own new one.

I am still seeing this exact behaviour on v0.7.6

This csv has 3 columns with the following values x y,y z

The plugin is ignoring " as a quote character and reads x, "y,y", z as having 4 columns.

2023-06-26 16_54_58-CSV edit testcsvedit csv - _MyTempDocs - Visual Studio Code

janisdd commented 1 year ago

Puh, for a second you got me...

As in the original issue

When delimiter is , (with a space after comma), and there is , or , in the cell(with quotes), the read function will break the cell by , or ,.

If you change the delimiter to , it works as expected.

IAmWhitBran commented 1 year ago

Huh, wow, didn't realise that, even after reading the original issue...

Is there a fix for that that can be implemented at all?

While it does stop this happening, as a work around, it feels like there is still an issue here.

janisdd commented 1 year ago

I'll add an option readOption_delimitersToGuess where you can manually specify the delimiters that should be guessed. If you add , (, + whitespace) before , in the list, it may work... at least it works for this example.

IAmWhitBran commented 1 year ago

As alternative, would something like an "ignore unquoted whitespace" flag be possible?

I believe, in the context as a CSV, respecting whitespace does not matter, unless it is explicitly defined to be there by use of the quoted string. Having this as a flag would allow for it to still respect the current functionality, without users having to guess all the possible delimiters inserted into unclean data. Having this disabled by default would allow for full backwards compatibility unless set to true.

janisdd commented 1 year ago

Actually, whitespace is handled in the csv rfc, see issue in papaparse and should not be ignored...

Such a flag might be interesting but there might be some implications on the parsing side. Maybe at some point.