vincentlaucsb / csv-parser

A high-performance, fully-featured CSV parser and serializer for modern C++.
MIT License
901 stars 150 forks source link

How to improve parsing performance ? #114

Closed fdinu closed 4 years ago

fdinu commented 4 years ago

Hey guys,

Not sure this qualifies as an issue but not sure where to write this. I've been doing some tests against the Univocity parser. Univocity is much faster and I am trying to understand what can I do to make your parser faster since I really like it otherwise.

Here's an example. I am parsing a 1GB file that has ~100 columns and ~100 bytes per column. The input file is cached in memory. Univocity parses the file in about 0.7 - 0.8 seconds while this parser takes about 4.8 seconds. For a 5GB file with the same row properties, Univocity takes 3.1-3.5 seonds while this parser takes 22-23 seconds. I tried both with writing the output to /dev/null and to a file and the results are pretty similar.

My code is very straight-forward:

CSVReader reader(argv[1]);

for (auto& row: reader) {
   std::cout << row[0].get() << ",";
   std::cout << row[1].get() << ",";
   std::cout << row[2].get() << ",";
   std::cout << row[3].get() << std::endl;
}

Looking for ways to improve performance. Thank you !

vincentlaucsb commented 4 years ago

Hi, thanks for your ticket.

I'm not sure of any ways to improve parsing performance outside of making some code tweaks (and there is a pretty big one that I suspect will improve performance, hopefully I'll get time to work on it soon). Just keep track of this repo and I'm sure one day it'll be just as fast.