A high-performance, fully-featured CSV parser and serializer for modern C++.
header_row not working when loading from file #195

Open Mindavi opened 2 years ago

Mindavi commented 2 years ago

I'm trying to load a file from disk using a csv::CSVReader. I'm setting up the CSVFormat to auto-guess the file format.

However, I'm having trouble skipping the first row(s) of a csv file using this library.

I made an example test file and test case to show what's going wrong.




// Could be added to test_read_csv_file.cpp
TEST_CASE("Skip rows loaded from file", "[skip_rows_file]")
  auto format = csv::CSVFormat::guess_csv();

  csv::CSVReader reader("skip_rows.csv", format);

  std::vector<std::string> expected = {
      "timestamp", "distance", "angle", "amplitude"

  // Original issue: Leading comments appeared in column names
  REQUIRE(expected == reader.get_col_names());

Test result

Skip rows loaded from file

C:\csv-parser\tests\test_read_csv_file.cpp(90): FAILED:
  REQUIRE( expected == reader.get_col_names() )
with expansion:
  { "timestamp", "distance", "angle", "amplitude" }
  { "a", "b", "c", "d" }

As can be seen, the header_row setting is not honored in this case, and the header is derived from the first row.

(After looking a bit in the code, this may be intended behavior. Please close this issue if it is 👍. Since it's kind of surprising regardless, I'd like to note it anyway).

I have 2 workarounds for this for whenever someone runs into this:

  1. Hardcode the delimiter (this reduces flexibility but does work)
  2. Get a guess for the format using the guess_format function and use that (example below) when constructing the reader, to disable the auto-guessing feature during construction

Guess workaround:

csv::CSVGuessResult guess = csv::guess_format("filename.csv");
csv::CSVFormat fmt;
csv::CSVReader reader("filename.csv", fmt);