vincentlaucsb / csv-parser

A high-performance, fully-featured CSV parser and serializer for modern C++.
MIT License
901 stars 150 forks source link

CSVFormat::no_header() has no effect when multiple delimiters are specified for format #152

Closed TrungDinhT closed 3 years ago

TrungDinhT commented 3 years ago

First of all, thank you for your great library.

For my use case, when I specify multiple possible delimiters for CSVFormat and no_header() at the same time, no_header() is ignored and the first row is always parsed as header row. This does not happen when I only specify one delimiter for the format.

Looking into the implementation of CSVReader constructor in csv_reader.cpp

https://github.com/vincentlaucsb/csv-parser/blob/e4dd256b3a0dc76235084fce54e463bb7a98848f/include/internal/csv_reader.cpp#L159-L164

we can see that whenever guess_delim() is true, header row will be overwritten with the guess result format. And guess_delim() returns true whenever there are multiple possible delimiters

https://github.com/vincentlaucsb/csv-parser/blob/e4dd256b3a0dc76235084fce54e463bb7a98848f/include/internal/csv_format.hpp#L135-L137

Quick fix would be something like this

format.header = (format.header == -1) ? guess_result.header_row - 1 : guess_result.header_row;

However, I don't know if that would introduce anything unintentional for the parser.

So, is this behavior intended with no-header-row case?

vincentlaucsb commented 3 years ago

This reader doesn't support multiple delimiters. You can specify multiple potential delimiters, but only one will be chosen and used.

TrungDinhT commented 3 years ago

Yes, I meant to specify multiple potential delimiters so that I can parse different csv files with different delimiters. However, as I explained, in case that I specify multiple potential delimiters in for CSVFormat, no_header() no longer has effect.