Add Sep - the fastest yet

joelverhagen / NCsvPerf

A test bench for various .NET CSV parsing libraries

https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers

MIT License

69 stars 14 forks source link

Add Sep - the fastest yet #51

Closed nietras closed 11 months ago

nietras commented 1 year ago

@joelverhagen cc @JoshClose @MarkPflug this adds Sep a new library. As seen below this should be about 1.3x faster than Sylvan, sorry! 😅 Awaiting official results before introducing this to the world on my blog nietras.com.

BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19044.2965/21H2/November2021Update)
AMD Ryzen 9 5950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=7.0.203
  [Host]     : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2
  Job-PCYXTT : .NET 7.0.5 (7.0.523.17405), X64 RyuJIT AVX2

InvocationCount=1  IterationCount=6  LaunchCount=1  
UnrollFactor=1  WarmupCount=2

Method	LineCount	Mean	Error	StdDev	Gen0	Gen1	Gen2	Allocated
CsvHelper	1000000	2,139.1 ms	53.65 ms	19.13 ms	18000.0000	17000.0000	3000.0000	260.59 MB
Sep	1000000	733.0 ms	21.03 ms	7.50 ms	18000.0000	17000.0000	3000.0000	260.42 MB
Sylvan_Data_Csv	1000000	936.0 ms	25.39 ms	9.05 ms	18000.0000	17000.0000	3000.0000	260.77 MB

MarkPflug commented 1 year ago

As seen below this should be about 1.3x faster than Sylvan, sorry!

Why apologize? You're moving the needle, and that's always good. I've already studied your SIMD code, and am applying some improvements back to my library. Our SIMD code was pretty similar, but the key intrinsic I was missing was PackUnsignedSaturate which allows processing twice as many characters per iteration. Adding that and an AVX2 (256 bit) path brings Sylvan within the margin of error.

Nice work @nietras!

nietras commented 1 year ago

Why apologize?

Just a teasing sorry 😉

Adding that and an AVX2 (256 bit) path brings Sylvan within the margin of error.

Sorry missed this part. That was quick, especially considering how much time I spent tweaking and running benchmarks 😅. But great! I definitely have to concede that there are some non-trivial costs to some of the design decisions I made for Sep.

leandromoh commented 1 year ago

Hello dudes,

Something I could not realise by my own is how/where do you use SIMD code in CSV parsing? What I understood reading wikipedia is that SIMD is used to performe parallel processing with numbers, like vector operations. With your help, I can figure out if my lib (RecordParser) can benefit (or not) of SIMD too.

@MarkPflug @nietras

EDIT: Is something like this ?

joelverhagen commented 11 months ago

Sorry for the delay folks, I've been pulled away on family and work stuff for a while. I'll get all of these PRs in soon along with an updated blog.

nietras commented 11 months ago

Thanks, this should definitely be updated to latest version on nuget, though. 😊