joelverhagen / NCsvPerf

A test bench for various .NET CSV parsing libraries
https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers
MIT License
69 stars 14 forks source link

Add Sep - the fastest yet #51

Closed nietras closed 11 months ago

nietras commented 1 year ago

@joelverhagen cc @JoshClose @MarkPflug this adds Sep a new library. As seen below this should be about 1.3x faster than Sylvan, sorry! 😅 Awaiting official results before introducing this to the world on my blog nietras.com.

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19044.2965/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.203
  [Host]     : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2
  Job-PCYXTT : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2

InvocationCount=1  IterationCount=6  LaunchCount=1  
UnrollFactor=1  WarmupCount=2  
Method LineCount Mean Error StdDev Gen0 Gen1 Gen2 Allocated
CsvHelper 1000000 2,139.1 ms 53.65 ms 19.13 ms 18000.0000 17000.0000 3000.0000 260.59 MB
Sep 1000000 733.0 ms 21.03 ms 7.50 ms 18000.0000 17000.0000 3000.0000 260.42 MB
Sylvan_Data_Csv 1000000 936.0 ms 25.39 ms 9.05 ms 18000.0000 17000.0000 3000.0000 260.77 MB
MarkPflug commented 1 year ago

As seen below this should be about 1.3x faster than Sylvan, sorry!

Why apologize? You're moving the needle, and that's always good. I've already studied your SIMD code, and am applying some improvements back to my library. Our SIMD code was pretty similar, but the key intrinsic I was missing was PackUnsignedSaturate which allows processing twice as many characters per iteration. Adding that and an AVX2 (256 bit) path brings Sylvan within the margin of error.

Nice work @nietras!

nietras commented 1 year ago

Why apologize?

Just a teasing sorry 😉

Adding that and an AVX2 (256 bit) path brings Sylvan within the margin of error.

Sorry missed this part. That was quick, especially considering how much time I spent tweaking and running benchmarks 😅. But great! I definitely have to concede that there are some non-trivial costs to some of the design decisions I made for Sep.

leandromoh commented 1 year ago

Hello dudes,

Something I could not realise by my own is how/where do you use SIMD code in CSV parsing? What I understood reading wikipedia is that SIMD is used to performe parallel processing with numbers, like vector operations. With your help, I can figure out if my lib (RecordParser) can benefit (or not) of SIMD too.

@MarkPflug @nietras

EDIT: Is something like this ?

joelverhagen commented 11 months ago

Sorry for the delay folks, I've been pulled away on family and work stuff for a while. I'll get all of these PRs in soon along with an updated blog.

nietras commented 11 months ago

Thanks, this should definitely be updated to latest version on nuget, though. 😊