joelverhagen / NCsvPerf

A test bench for various .NET CSV parsing libraries
https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers
MIT License
69 stars 14 forks source link

Add more libraries #38

Closed MarkPflug closed 3 years ago

MarkPflug commented 3 years ago

Looks like I missed the last round by a couple of days, bummer. This adds a bunch of new libraries for the next update, and should close out some of the open issues.

Angara.Table from #15 FileHelpers from #13 Microsoft.ML from #5 Open.Text.CSV DSV KBCsv Microsoft.Data.Analysis

Three of these are from Microsoft, as it appears that Angara.Table is an MS research thing, and is an F# library. Working with that one makes me wonder if using C# libraries in F# feels as clunky as F# libraries feel in C#.

The FileHelpers implementation might look a bit funny as it parses to a "temporary" object (it only binds to objects, no raw access) and then that temporary is copied to the final PackageAsset. I also tried having it bind directly to the PackageAsset, but surprisingly that ended up being slower anyway.

I also updated packages to the latest versions.

MarkPflug commented 3 years ago

Added Cesil as well.

MarkPflug commented 3 years ago

Changed the way ChoETL is handled to avoid the AsDataReader which internally uses reflection which is partially responsible for how slow it was. It is specialized now for PackageAsset instead of being able to handle generic T, similar to how FileHelpers was done. This might violate the spirit of the benchmarks being "low-level", but the performance difference for ChoETL is ~3x.

joelverhagen commented 3 years ago

The FileHelpers implementation might look a bit funny as it parses to a "temporary" object (it only binds to objects, no raw access) and then that temporary is copied to the final PackageAsset. I also tried having it bind directly to the PackageAsset, but surprisingly that ended up being slower anyway.

Changed the way ChoETL is handled to avoid the AsDataReader which internally uses reflection which is partially responsible for how slow it was. It is specialized now for PackageAsset instead of being able to handle generic T, similar to how FileHelpers was done. This might violate the spirit of the benchmarks being "low-level", but the performance difference for ChoETL is ~3x.

I guess the numbers don't lie. Seems fine to me. I think the best effort is fine here. If the package author want's to provide a new API or an alternate adapter implementation, that's fine. Perhaps I'll mention this caveat in the next blog post update.

joelverhagen commented 3 years ago

This is huge. Thanks, @MarkPflug! My column chart is getting so crowded 😄.

joelverhagen commented 3 years ago

By the way, nice job driving https://github.com/Open-NET-Libraries/Open.Text.CSV/issues/1! It was cool to see how quickly the author reacted.

jzabroski commented 3 years ago

Awesome.

joelverhagen commented 3 years ago

Looks like I missed the last round by a couple of days, bummer. This adds a bunch of new libraries for the next update, and should close out some of the open issues.

@MarkPflug, I've updated the blog post.

MarkPflug commented 3 years ago

Thanks Joel. Disappointing results for my library. 😥 I thought I'd made a breakthrough by adding a SIMD fast-path. On my machine it runs in 1.3s (more than 20% improvement), and all the other numbers are comparable to yours. Must be a difference in how the code gets JITed for your CPU vs mine. I've got an Intel i7-7700K, so I would have thought your newer AMD would support the same feature set. Perhaps it does support the SIMD features, but they have different performance characteristics that make it slower than the Intel implementation. Which might explain it, because at worse, I would have expected the performance to remain stable with the previous run, but instead its a pretty sizable regression.

If you run this console app, are all the features enabled?

using System;
using System.Runtime.Intrinsics.X86;

class Program
{
    static void Main()
    {
        Console.WriteLine($"Bmi1: {Bmi1.IsSupported}");
        Console.WriteLine($"Bmi2: {Bmi2.IsSupported}");
        Console.WriteLine($"Sse: {Sse.IsSupported}");
        Console.WriteLine($"Sse2: {Sse2.IsSupported}");
        Console.WriteLine($"Sse3: {Sse3.IsSupported}");
    }
}
joelverhagen commented 3 years ago

Seems enabled.

Bmi1: True
Bmi2: True
Sse: True
Sse2: True
Sse3: True

Perhaps the Benchmark.net runtime affects this? Not sure.

MarkPflug commented 3 years ago

Consulting https://www.agner.org/optimize/instruction_tables.pdf It looks like there are definitely some timing differences between Skylake and Zen2. Most notably in the PEXT instruction which is significantly slower on zen2. Might have to investigate a change to the SIMD logic to avoid that instruction, or disable the SIMD path for those CPUs. SIMD fail! I should probably leave this stuff to people who know what they're doing. 😅

edit: Zen 3 appears to have a good PEXT implementation, so maybe you should just get a new CPU. 😂

MarkPflug commented 3 years ago

@joelverhagen, can I ask you a huge favor? 😁 I modified the SIMD logic to avoid the PEXT instruction, and I'm wondering if you would run the benchmarks (just Sylvan) on your machine against package version 1.1.6-b0001 and report back the timing. I've no access to a Zen2, so I have no way to test whether the logic change will perform better. The timing is essentially unchanged when running on my Intel chip.

joelverhagen commented 3 years ago

BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.1110 (20H2/October2020Update)
AMD Ryzen 9 3950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.100-preview.1.21103.13
  [Host]     : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT
  Job-AMQDIG : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT

InvocationCount=1  IterationCount=4  LaunchCount=1  
UnrollFactor=1  WarmupCount=2  
Package LineCount Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
CsvHelper 1000000 2.483 s 0.0522 s 0.0081 s 33000.0000 18000.0000 3000.0000 261 MB
Cursively 1000000 1.732 s 0.0586 s 0.0091 s 44000.0000 17000.0000 3000.0000 345 MB
RecordParser 1000000 2.177 s 0.1229 s 0.0190 s 33000.0000 18000.0000 3000.0000 261 MB
Sylvan.Data.Csv 1000000 1.410 s 0.0393 s 0.0061 s 33000.0000 18000.0000 3000.0000 261 MB

@MarkPflug not-worthy-waynes-world

joelverhagen commented 3 years ago

For posterity, this is amazing: Sylvan.Data.Csv/CsvDataReader.cs (although perhaps a different version than the 1.1.6-b0001 I tested?)

MarkPflug commented 3 years ago

Nice! That's more like what I was expecting to see! Thanks Joel!

Yes, that implementation is out-of-date, I haven't pushed the Zen2 fixed code to github yet. The ParallelBitExtract calls were the problem. I also need to clean up/document the code and turn it into something I'll be able to understand in the future. It is a bit of an ugly mess at the moment.

jzabroski commented 3 years ago

@JoshClose better step it up with some Hans and Frans howmuchyaBENCHmarks

@MarkPflug not-worthy-waynes-world

JoshClose commented 3 years ago

Ha! Maybe Mark wants to do a pull request for me. lol

I've accepted the fact that I can't possibly get CsvHelper to these speeds due to features. When I was working on a new parser for speed I got it really fast. Then I started adding in all the flexibility features and it slowed down quickly. I've thought about doing a strict parser just for the benchmark, but Mark went a step further than I'm willing to go. 😉 Awesome job Mark.

leandromoh commented 3 years ago

Working with that one makes me wonder if using C# libraries in F# feels as clunky as F# libraries feel in C#.

@MarkPflug not usually, since most part of framework is written in C# then F# offers a more natural interoperability between languagues for its developers use tons of C# libraries without pain. However the opposite is not true, because C# is the dominant language.

BTW, Awesome job Mark.