vincentlaucsb / csv-parser

A high-performance, fully-featured CSV parser and serializer for modern C++.
MIT License
901 stars 150 forks source link

header_row not working when loading from file #195

Open Mindavi opened 2 years ago

Mindavi commented 2 years ago

I'm trying to load a file from disk using a csv::CSVReader. I'm setting up the CSVFormat to auto-guess the file format.

However, I'm having trouble skipping the first row(s) of a csv file using this library.

I made an example test file and test case to show what's going wrong.

skip_rows.csv:

a;b;c;d
this;is;before;header
this;is;before;header_too
timestamp;distance;angle;amplitude
22857782;30000;314159;0
22857786;30000;314109;0

test_read_csv_file.cpp:

// Could be added to test_read_csv_file.cpp
TEST_CASE("Skip rows loaded from file", "[skip_rows_file]")
{
  auto format = csv::CSVFormat::guess_csv();
  format.header_row(3);

  csv::CSVReader reader("skip_rows.csv", format);

  std::vector<std::string> expected = {
      "timestamp", "distance", "angle", "amplitude"
  };

  // Original issue: Leading comments appeared in column names
  REQUIRE(expected == reader.get_col_names());
}

Test result

PS C:\csv-parser> .\build\tests\Debug\csv_test.exe "[skip_rows_file]"
Filters: [skip_rows_file]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
csv_test.exe is a Catch v2.12.1 host application.
Run with -? for options

-------------------------------------------------------------------------------
Skip rows loaded from file
-------------------------------------------------------------------------------
C:\csv-parser\tests\test_read_csv_file.cpp(78)
...............................................................................

C:\csv-parser\tests\test_read_csv_file.cpp(90): FAILED:
  REQUIRE( expected == reader.get_col_names() )
with expansion:
  { "timestamp", "distance", "angle", "amplitude" }
  ==
  { "a", "b", "c", "d" }

===============================================================================
test cases: 1 | 1 failed
assertions: 1 | 1 failed

As can be seen, the header_row setting is not honored in this case, and the header is derived from the first row.

(After looking a bit in the code, this may be intended behavior. Please close this issue if it is 👍. Since it's kind of surprising regardless, I'd like to note it anyway).

I have 2 workarounds for this for whenever someone runs into this:

  1. Hardcode the delimiter (this reduces flexibility but does work)
  2. Get a guess for the format using the guess_format function and use that (example below) when constructing the reader, to disable the auto-guessing feature during construction

Guess workaround:

csv::CSVGuessResult guess = csv::guess_format("filename.csv");
csv::CSVFormat fmt;
fmt.delimiter(guess.delim).header_row(3);
csv::CSVReader reader("filename.csv", fmt);