microsoft / PowerToys

Windows system utilities to maximize productivity
MIT License
107.02k stars 6.31k forks source link

[Advanced Paste > Paste as JSON] CSV parser should support escaping delimiter by enclosing it in `"` character #33166

Open htcfreek opened 1 month ago

htcfreek commented 1 month ago

Description of the new feature / enhancement

The current implementation splits the line on every , character. But the csv format says that you can have a cell containing the delimiter if the cell content is enclosed in " characters. We should not split cell data if they contain the delimiter an being enclosed with the " character.

Scenario when this would be used?

For CSV with that have the delimiter in the data values.

Supporting information

No response

htcfreek commented 1 month ago

Blocked by the rework of the parser (#33165 , #33085).

htcfreek commented 4 weeks ago

Possible solution using regex:

@",(?=[^""]*(?:""[^""]*""[^""]*)*$)"

Thanks to @GhostVaibhav.

GhostVaibhav commented 11 hours ago

Hi @htcfreek, if this is a small one, maybe I could help and open a PR quickly.

htcfreek commented 11 hours ago

Hi @htcfreek, if this is a small one, maybe I could help and open a PR quickly.

@GhostVaibhav The problem here is the correct regex. We didn't figure it out yet. But your help is welcome.

GhostVaibhav commented 11 hours ago
Screenshot 2024-07-07 at 6 26 27 PM

@htcfreek, I think your regex is working fine.

htcfreek commented 11 hours ago
Screenshot 2024-07-07 at 6 26 27 PM

@htcfreek, I think your regex is working fine.

@GhostVaibhav What happened if the comma is in the first data string. I imagine that the regex os failing then.

Another thing might be escaping the " with "".

GhostVaibhav commented 10 hours ago
Screenshot 2024-07-07 at 6 30 27 PM

@htcfreek For the first data string, it's passing. But for escaping with double quotes, it's failing.

Screenshot 2024-07-07 at 6 34 03 PM
GhostVaibhav commented 10 hours ago

@htcfreek I think I found the exact implementation you're looking for.

https://stackoverflow.com/a/769713/12634085

htcfreek commented 10 hours ago

But for escaping with double quotes, it's failing.

Screenshot 2024-07-07 at 6 34 03 PM

The last test line is wrong. It should not end with a " or you need """ instead of "". Because your test line has a closing bot no opening " in the last data element.

GhostVaibhav commented 10 hours ago

So, if we take that into consideration, it is passing.

htcfreek commented 9 hours ago

Not sure if my regex (found with copilot) is the same as yours but it seems to work:

,(?=(?:[^""]*""[^""]*"")*[^""]*$)

Screenshot_20240707-160224_Samsung Internet.jpg

We can add an additional check if the number of " can be divided by 2. => The no " is missing.

GhostVaibhav commented 9 hours ago

@htcfreek all the regexes mentioned in this thread are passing all the checks.

htcfreek commented 9 hours ago

@GhostVaibhav Do you like to work on it? Should be easy to implement:

The file to update is <Advanced Paste Module>/Helpers/JsonHelper.cs

GhostVaibhav commented 9 hours ago

@htcfreek I would. But, I would take a little more time than regular devs, since it's my first time working on this module. Hope it's fine.

htcfreek commented 8 hours ago

@GhostVaibhav Regarding the ToDos: We also have to update the split code to use the same regex.

And a small not about testing: You have to build the whole PowerToys solution.

htcfreek commented 8 hours ago

@GhostVaibhav I found a way to remove/replace the quotation marks in csv code to get correct json version.

Example:

csv code: "my ""text"""
json code: my ""text""
json imported: my "text"

This requires the two following action in correct order: First remove single marks, then replace other ones.

1. remove: ^""(?!"")|(?<!"")""$|^""$
- First and last sign if single " AND only two "

2. replace by single ": ^""{3}(?!"")|(?<!"")""{3}$|""{2}
- first three* OR last three* OR pairs of two (* if not more than three)

Do you think this should be an extra issue/pr or does it make sense to implement with escape handling?

GhostVaibhav commented 7 hours ago

@htcfreek, I have no idea about this stuff, as I'm still very new. But, I think this should be a separate issue since it is no where related to this issue.