janisdd / vscode-edit-csv

vs code extension to edit csv files with an excel like table ui
MIT License
229 stars 35 forks source link

Keep unusual format #117

Open artyb55 opened 1 year ago

artyb55 commented 1 year ago

Thank you for this plugin.

I need to edit CSV files that aren’t compliant with RFC 4180 CSV rules, and I can’t change their format. They have some rows of differing lengths, and some fields are enclosed in quotes even though they don’t strictly have to be according to the standard. The files also have 4 header rows. I’ve been searching for an editor that will allow me to move some fields, and delete some columns, WITHOUT changing the formatting of the CSV file. I don’t know that behaviour could be added as options to this extension?

I’ve tried LOTS of editors and only found one commercial piece of software that does what I need.

BW, A

janisdd commented 1 year ago

That feels a little bit too out of scope for this plugin.

This plugin works by reading the csv file, showing the table and converting the table back to csv file (replacing the csv file content). This means the formatting has to be "saved" when reading the csv file and re-applied when writing the table back to the csv file. There are some options, e.g., Retain Quote Information (for columns) (and maybe some others I forgot) but that wouldn't cover your use case.

I think it is just not practical to keep all formatting information and I don't know which formatting information this would include. Quoted fields seem to be important but there are probably many others that I would overlook.

A simple example to demonstrate the complexity would be keeping the information about quotes for each single field and then deleting a row that contained a quoted field. What would be the right thing to do:

And this is only for the quote stuff... there is also information about headers, whitespace, ...

In the end, I think this is too specific for the plugin and your particular use case. I wouldn't feel comfortable trying to implementing that and then the format not being right in all cases.

artyb55 commented 1 year ago

OK, thanks for your quick and detailed reply. BW, A

artyb55 commented 1 year ago

Actually I've just been playing with the read and write options again and I can get very close to the behaviour I was after by removing the QuoteChar in both the Read Options and Write Options, and EscapeChar in the Write Options. My question is largely about the quotes rather than other whitespace etc. I've found lots of people trying to find ways of editing CSV files without the quotes getting mangled, so I hope this information is useful to someone. Because of this I don't think that the 'format' of all cells would need to be stored-it's not as difficult as you were thinking as it works with these settings! With these settings I can move fields around (including their original quotes if present) and delete columns fine. The issues left would be: 1) For fields that include the delimiter within the quotes; with the settings used of course the quotes are ignored and the delimiter is interpreted as a delimiter. In an ideal world there would be an option to not interpret a delimiter if it is enclosed within quotes, but not interpret or change the quotes in any other way. Mostly I wouldn't need to edit those column types, and the resulting text file does go back to how it was originally, so this issue wouldn't be a show-stopper. 2) The only other minor issue is that CSV-edit adds additional delimiters at the end of header lines that are shorter than other lines, which I can delete in any text editor. So, both of these things I can work around fairly easily. So it's not as easy as the commercial alternative I found, but still does achieve what I wanted to better than eg a spreadsheet which messes up quotes and sometimes cell contents too. 3) I have to rename the file extension to .csv first, then change it back after the editing. Is there any way to get the extension to activate with different file extensions?

I've attached some example files to clarify what I'm trying to do. Original example file: Edit-CSV_VSCode_ExampleDataFile__2023_03_30AB1.csv

Present result from CSV-edit with those read and write settings: Edit-CSV_VSCode_ExampleDataFile_PresentResult_2023_03_31AB1.csv

Correctly edited: Edit-CSV_VSCode_ExampleDataFile_CorrectlyEdited__2023_03_30AB1.csv

BW, A

janisdd commented 1 year ago

For the file extension problem you can see #47

Of all the problems, the 2nd is the easiest to fix, as this can be a separate setting (I think)

In an ideal world there would be an option to not interpret a delimiter if it is enclosed within quotes, but not interpret or change the quotes in any other way.

So in my example above when you remove a row where a field is quoted, this information would be lost (this is probably the expected and desired behavior). However, this would means that for every field we would need to store if it's quoted or not (probably just store which ones are quoted as there are probably more unquoted than quoted fields). The current Retain Quote Information only works for columns but maybe a setting for fields would be good which does store the information for each quoted field separately.

(As you can image this would slow down the plugin for very large tables with many quoted fields... there should be a hint when setting this option)

So, I would propose that I add two new options:

artyb55 commented 1 year ago

Hi,

Thank you for reopening this, and for the answer to the file extension Q. On my issue 1, I'm less likely to need to delete or move rows in this tool because that can be done routinely in a standard text editor; the difficulty is with columns. Does that affect your comment? Removing a row or column should lose the format that those fields had, I agree. Your extension is handling quotes as I was hoping with the Read and Write options set, except that the settings prevent quoted delimiters from being ignored. I wasn't sure if you would need to store the quote information for every field. I'd prefer to see the quotes in the fields. It'd just be nice if a quoted delimiter eg ...,"Text,Info",... was treated as "Text,Info" rather than the middle delimiter being used to split the field up. Could the 'format' of only these problematic fields where there is a quoted delimiter be stored?

What you're proposing sounds feasible though if it does need the 'format' info to be stored for every field. The warning about speed sounds very sensible.

I've figured out that I can apply changes to the original file without saving it, and then use 'save as' if I want to save with a different filename.

I'm very happy to test things if that's useful to you. BW, A