Closed rebolbot closed 8 years ago
Submitted by: BrianH
This is a bug, not a gotcha that needs documenting. Is this one more consideration for the PARSE rewrite, or a quick fix?
Submitted by: Carl
I don't understand what result you want. The delimiters are in conflict. The quotes on the string make it a single "atom". Then you have both comma and tab for delimiters in the data, but you only specify tab as the delimiter? If so, then the comma is just data, not a delimiter, so the result above is correct.
If you want the comma removed, specify it as a delimiter.
parse/all str "^-," ; tab and comma
If you want a specific result that you're not seeing, please post it in the ticket.
Submitted by: BrianH
I think that "The quotes on the string make it a single atom." was the source of confusion. I guess it is a gotcha that needs documenting after all, particularly since fixing this would break the ability for the data to contain the delimiter in the quoted portion. We can start by marking this as not a bug.
Submitted by: Sunanda
What I wanted (and the application needed) was for parse to break the input string at the tabs character, regardless of any other special characters -- like quotes or commas. I know my application's input cannot consist of strings with embedded tabs. And I did not want parse to use its initiative.
But (as Brian suggests) that would conflict with my wish in other applications where I'd expect parse to intelligently handle embedded tabs and/or commans in CSV files.
So (also as Brian has suggested offline) the real issues are:
1. unclear mental model of parse's built-in logic when it encounters embedded delimited strings
2. expectation that parse can handle all CSV files, when we really need a snazzy mezz like decode-csv to handle all
the messiness and RFC#4180 specifications.
The gotcha's are:
1. assuming parse does not have special handling for quotes
2. assuming parse unaided can handle all possible CSV files.
Submitted by: BrianH
Well,
1. Simple PARSE's handling of quotes is mostly* consistent with RFC4180, and useful.
2. Handling all possible CSV files is unlikely for simple PARSE, since the differences are contradictory.
The rest sounds like a job for a DECODE-CSV mezzanine. Let's declare this a feature.
* Mostly:
According to R3:
>> parse {"hello""world^/",a} ","
== ["hello" "world^/" "a"]
>> length? parse {"hello""world^/",a} ","
== 3
According to http://tools.ietf.org/html/rfc4180 :
>> parse {"hello""world^/",a} ","
== [{hello"world^/} "a"]
>> length? parse {"hello""world^/",a} ","
== 2
Added a ticket for the above: #1079
Submitted by: Carl
I agree. I is handy if PARSE can deal with simplistic CSV formats.
For the heavy-duty create DECODE 'CSV data -- an R3 codec, rather than a separate function. That way you can build the encoder at the same time, and have a cool combo. (And yes, it should be possible to write it in R code.)
Submitted by: Sunanda
This works as expected:
But here, parse effectively promotes the comma to [tab ","]
R2 does the same. Issue found in R2 while debugging a live application that attempts to read a tab-delimited file. At they very least the issue is a gotcha that needs documenting so we can develop robust import routines.
CC - Data [ Version: alpha 66 Type: Bug Platform: All Category: Parse Reproduce: Always Fixed-in:none ]