joelverhagen / NCsvPerf

A test bench for various .NET CSV parsing libraries
https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers
MIT License
69 stars 14 forks source link

Syntax Update for SoftCircuits.CsvParser #49

Closed SoftCircuits closed 11 months ago

SoftCircuits commented 1 year ago

In SoftCircuits.CsvParser 4.0.0, the ReadRow() method is now a wrapper for the preferred Read() method.

On your next update, could you update SoftCircuits_CsvParser.cs to use the following while loop?

while (reader.Read())
{
    var record = activate();
    record.Read(i => reader.Columns[i]);
    allRecords.Add(record);
}

Thanks.

SoftCircuits commented 1 year ago

Also, I wanted to add a comment. Using my benchmarks, my parser showed a performance boost of better than 100%. However, using your benchmarks, the improvements are much more modest.

Examining your test data, I have an idea why this might be. Your test data contains no special characters (quotes, commas, newlines), which require special handling in CSV files. My test data included many random punctuation characters and multi-line field values. And I guess I had really optimized for those cases.

Which makes me think your results might be drastically different with different data.

I tried running your benchmarks with my test file but it looks like you're expecting a particular format in the results so it failed. Also, I tried uploading my test file here so anyone who wanted to play with it could. But it exceeds the size GitHub will allow. But if anyone's interested in my test data, I'd be happy to make it available.