Open webfrank opened 9 months ago
Hey @webfrank, I suspect this might be an issue in your environment on macOS, I just tried it on my macbook and the example works fine, gives me:
["2023/08/05","Czech Republic",""]
["2023/08/05","Czech Republic",""]
["2023/08/05","Czech Republic",""]
["2023/08/05","Czech Republic",""]
["2023/08/05","Czech Republic",""]
["2023/08/05","Czech Republic",""]
["2023/08/05","Czech Republic",""]
["2023/08/05","Czech Republic",""]
["2023/08/05","Czech Republic",""]
Hi, it works because you are copying directly what I have pasted. This works to me also.
Making this I understood it was a line ending issue.
I'll attach the original converted file. test.csv
Edited the file with an HEX editor
Excel adds three bytes which are not visible and when you try to access the field it doesn't work although it seems correct.
The three bytes are the BOM (Byte Order Mark) which Excel adds and expect to correctly parse a CSV. Is it possible to handle BOM and Non-BOM headers directly in Benthos?
I had a lot of issues with BOM in the past, so it makes some sense. It maybe create a field as “FE BB BF Date”
can you try to list the keys inside the pipeline?
Hi, the issue is there, if BOM is present (and not printable), will be included in first field key but you will not be able to access it.
I've created a pull request (https://github.com/benthosdev/benthos/pull/2118) to fix this using this library: https://github.com/dimchansky/utfbom
Ok, it seems skipbom is already present as additional codec "skipbom/csv". Probably a reference in the CSV codec would help finding it. One last thing, the CSV input comonent should have an option to enable skip BOM at this point to be fully compatible with the codec.
Hi I have a simple CSV:
and a simple pipeline:
in the output the first element (this.Date) is null:
but if I remove the pipeline the output is:
so "Date" is parsed correctly from CSV
If I add a dummy first column this.Date get the right value.
It does not work on macOS, I tested on Linux and it works as expected.
Issue is related to line endings. The CSV was exported from Excel, rewriting it worked. Probably CSV parser should handle every line ending combination.