vincentlaucsb / csv-parser

A high-performance, fully-featured CSV parser and serializer for modern C++.
MIT License
864 stars 144 forks source link

Problem to get header content! #222

Open dangdkhanh opened 3 months ago

dangdkhanh commented 3 months ago

Hi,

Please check the attached file. The separator character is tab. The third column header returns the wrong content. Thanks you.

B.TXT

dangdkhanh commented 3 months ago

Hi, I realized the problem lies with the unicode characters that are at the beginning or end of each cell. will cause field_start and field_length calculations to be incorrect. The temporary way I came up with is to use a variable that defines the boundary:

const BOOL checkboundary(char x) {
return x == '\t' || x == ',';
};
const auto& isbeginend = data_pos == 0 || (data_pos > 1 &&
((!checkboundary(this->data_ptr->data[data_pos]) && checkboundary(this->data_ptr->data[data_pos - 1])) ||
(!checkboundary(this->data_ptr->data[data_pos]) && data_pos < size - 1 && checkboundary(this->data_ptr->data[data_pos + 1]))));

From there recalculate the flag:

const auto& flag = compound_parse_flag(in[data_pos], isbeginend);

It may work in some cases but not completely. B.TXT