jf-tech / omniparser

omniparser: a native Golang ETL streaming parser and transform library for CSV, JSON, XML, EDI, text, etc.
MIT License
931 stars 68 forks source link

Issue parsing csv file delimited with asterisk #192

Closed scotthedge closed 1 year ago

scotthedge commented 1 year ago

Any reason you know of that an asterisk-delimited file should fail parsing?

trying to parse an asterisk-delimited file, getting the following InvalidCsv Error message record/record_group '' needs min occur 1, but only got 0

So apparently it's not able to recognize the line as a record. Not sure why, we've tried many different delimiters successfully with the same code/templates. We are specifying the delimiter in code.

Same input file parses successfully just replacing the asterisks with commas and specifying comma as the delimiter. So it's specifically the asterisk that is causing the error. Also seeing the same issue with the carat "^" character as a delimiter. Thanks for any assistance!

jf-tech commented 1 year ago

A quick trial seems to prove omniparser csv2 has no problem dealing with * as delimiter: you can pull and switch to this branch issue_192, look for this file 1_single_row.input.csv and 1_single_row.schema.json - they work just fine by go test ./....

So as always, it'd be really helpful to include sample/repro-able input and schema for troubleshoot.

jf-tech commented 1 year ago

ping @scotthedge , any update?

scotthedge commented 1 year ago

We figured out to escape the asterisk in the header node, as you did in your update to the test branch. I didn't realize you had made that change, until now, but we got it sorted. Closing the issue