tofiqquadri / ngx-csv-parser

CSV Parser for Angular by Developers Hive
https://tofiqquadri.com/hire/connect
MIT License
25 stars 52 forks source link

Parse value containing delimiter #3

Closed mareksip closed 4 years ago

mareksip commented 4 years ago

Hello,

Using Excel generated CSV has columns containing comma that is also used as a delimiter. The parser is using even this delimited within string column and parses it as a new value. This results in incosistency in data. The CSV has 12 columns and for values containing delimiter we sometimes receive 14 columns.

CSV content:

"Kofola 0,3L,""Kofola 0,3L"",Nápoje,0,18,1,ks,,10000,1,0,100" "Kofola 0,5L,""Kofola 0,5L"",Nápoje,0,30,1,ks,,10001,1,0,101" "Fanta 0,3L,""Fanta 0,3L"",Nápoje,0,18,1,ks,,10002,1,0,102" "Fanta 0,5L,""Fanta 0,5L"",Nápoje,0,30,1,ks,,10003,1,0,103" Cappy ananas - láhev,Cappy ananas - láhev,Nápoje,0,30,1,ks,,10004,1,0,104 Cappy hruška - láhev,Cappy hruška - láhev,Nápoje,0,30,1,ks,,10005,1,0,105

Please, how to avoid this delimiter within column parsing?

csv-parser

Zelkreps commented 4 years ago

IMG_20200816_120719_927

tofiqquadri commented 4 years ago

@Zelkreps what have you commented can you please elaborate?

Zelkreps commented 4 years ago

@Zelkreps what have you commented can you please elaborate?

Just how it looks like when CSV content from @mareksip is parsed.

tofiqquadri commented 4 years ago

@mareksip @Zelkreps I checked as per the definition of CSV and the supporting references the column data can not contain the (,) comma as a value because the term CSV itself says that it is a comma-separated value.

Alternatively, you can choose another delimiter instead of the comma to separate your values like a semicolon and specify that as a delimiter in the library. Ngx-CSV-Parser supports different delimiters as you will mention in the parser function and can serve your purpose. Check the documentation of the library for more information.

I am closing the reference since can not find any issue with the library or limitations. You can reopen the issue if you still have something which needs to be looked upon.

Reference: https://en.wikipedia.org/wiki/Comma-separated_values

nazar-kuzo commented 4 years ago

@tofiqquadri I would like to debate on this statement

I checked as per the definition of CSV and the supporting references the column data can not contain the (,) comma as a value because the term CSV itself says that it is a comma-separated value.

I assume that there should be a way of handling the exception when column contains the delimiter and as you can imagine, Google Spreadsheets or Microsoft Excel or any other product compliant to RFC 4180 https://tools.ietf.org/html/rfc4180 wraps the column in quotes if it contains the delimiter and much more.

I would suggest you to read through the RFC 4180 and decide whether you want to be compliant to it and apply changes, otherwise update the library information saying that it is not RFC 4180 compliant and they will decide whether they want to use it or not.

Thanks!

tofiqquadri commented 4 years ago

@nazar-kuzo kindly read the documentation of this library. There is already a way to add delimiter of a different kind. If you do so then you can use a comma as a value. For example, if you use (;) as a delimiter then you can use comma (,) in the values.

This library is already in compliant to RFC 4180 https://tools.ietf.org/html/rfc4180

nazar-kuzo commented 4 years ago

@tofiqquadri Thanks for a quick reply!

I think that we are talking about two different things: Im suggesting you to investigate the common behavior of handling any delimiter exceptions done by other libraries but you suggesting me to workaround the problem with another type of delimiter, but let's say I need to use all the characters as value in a column like"123;,./?+-" what's then? which delimiter should I use?

nazar-kuzo commented 4 years ago

image

Here is a screenshot from the specification that you mentioned that you are compliant to.

There is the statement:

  1. Fields containing line breaks (CRLF), double quotes, and commas should be enclosed in double-quotes. For example:

    "aaa","b CRLF bb","ccc" CRLF zzz,yyy,xxx

Could you please prove somehow in your code how do you handle the "quotes or commas enclosed in double-quotes" according to the RFC? Since I dont see any line of code doing that.

tofiqquadri commented 4 years ago

@nazar-kuzo I didn't test it out with this case. You can check it yourself and update me if you find that it's needed. I'll also test it on my end when I get the time.

nazar-kuzo commented 4 years ago

@tofiqquadri Sure, I can answer you already that I have tested that behavior and can confirm that library is not handling columns wrapped in a quotes as well as not handling the situation when column contains delimiter and wrapped in a quotes.

Since you have mentioned that library should have been compliant to RFC 4180, I would re-open this issue, but I cannot do that since I`m not the originator of this topic.

Thanks again for a quick reply and have a good day!

tofiqquadri commented 4 years ago

@nazar-kuzo @Zelkreps @mareksip install the latest version of this library. I have updated it to add compliance to RFC 4180. Now you can have (,) inside the values.

Let me know if everything works fine I'll close this issue.

nazar-kuzo commented 4 years ago

@tofiqquadri

Works great, thanks again!

tofiqquadri commented 4 years ago

@nazar-kuzo thank you for helping me improve the library. 😄