Closed MarkPflug closed 3 years ago
Added Cesil as well.
Changed the way ChoETL is handled to avoid the AsDataReader which internally uses reflection which is partially responsible for how slow it was. It is specialized now for PackageAsset
instead of being able to handle generic T
, similar to how FileHelpers was done. This might violate the spirit of the benchmarks being "low-level", but the performance difference for ChoETL is ~3x.
The FileHelpers implementation might look a bit funny as it parses to a "temporary" object (it only binds to objects, no raw access) and then that temporary is copied to the final PackageAsset. I also tried having it bind directly to the PackageAsset, but surprisingly that ended up being slower anyway.
Changed the way ChoETL is handled to avoid the AsDataReader which internally uses reflection which is partially responsible for how slow it was. It is specialized now for
PackageAsset
instead of being able to handle genericT
, similar to how FileHelpers was done. This might violate the spirit of the benchmarks being "low-level", but the performance difference for ChoETL is ~3x.
I guess the numbers don't lie. Seems fine to me. I think the best effort is fine here. If the package author want's to provide a new API or an alternate adapter implementation, that's fine. Perhaps I'll mention this caveat in the next blog post update.
This is huge. Thanks, @MarkPflug! My column chart is getting so crowded 😄.
By the way, nice job driving https://github.com/Open-NET-Libraries/Open.Text.CSV/issues/1! It was cool to see how quickly the author reacted.
Awesome.
Looks like I missed the last round by a couple of days, bummer. This adds a bunch of new libraries for the next update, and should close out some of the open issues.
@MarkPflug, I've updated the blog post.
Thanks Joel. Disappointing results for my library. 😥 I thought I'd made a breakthrough by adding a SIMD fast-path. On my machine it runs in 1.3s (more than 20% improvement), and all the other numbers are comparable to yours. Must be a difference in how the code gets JITed for your CPU vs mine. I've got an Intel i7-7700K, so I would have thought your newer AMD would support the same feature set. Perhaps it does support the SIMD features, but they have different performance characteristics that make it slower than the Intel implementation. Which might explain it, because at worse, I would have expected the performance to remain stable with the previous run, but instead its a pretty sizable regression.
If you run this console app, are all the features enabled?
using System;
using System.Runtime.Intrinsics.X86;
class Program
{
static void Main()
{
Console.WriteLine($"Bmi1: {Bmi1.IsSupported}");
Console.WriteLine($"Bmi2: {Bmi2.IsSupported}");
Console.WriteLine($"Sse: {Sse.IsSupported}");
Console.WriteLine($"Sse2: {Sse2.IsSupported}");
Console.WriteLine($"Sse3: {Sse3.IsSupported}");
}
}
Seems enabled.
Bmi1: True
Bmi2: True
Sse: True
Sse2: True
Sse3: True
Perhaps the Benchmark.net runtime affects this? Not sure.
Consulting https://www.agner.org/optimize/instruction_tables.pdf It looks like there are definitely some timing differences between Skylake and Zen2. Most notably in the PEXT instruction which is significantly slower on zen2. Might have to investigate a change to the SIMD logic to avoid that instruction, or disable the SIMD path for those CPUs. SIMD fail! I should probably leave this stuff to people who know what they're doing. 😅
edit: Zen 3 appears to have a good PEXT implementation, so maybe you should just get a new CPU. 😂
@joelverhagen, can I ask you a huge favor? 😁 I modified the SIMD logic to avoid the PEXT instruction, and I'm wondering if you would run the benchmarks (just Sylvan) on your machine against package version 1.1.6-b0001 and report back the timing. I've no access to a Zen2, so I have no way to test whether the logic change will perform better. The timing is essentially unchanged when running on my Intel chip.
BenchmarkDotNet=v0.13.0, OS=Windows 10.0.19042.1110 (20H2/October2020Update)
AMD Ryzen 9 3950X, 1 CPU, 32 logical and 16 physical cores
.NET SDK=6.0.100-preview.1.21103.13
[Host] : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT
Job-AMQDIG : .NET 5.0.7 (5.0.721.25508), X64 RyuJIT
InvocationCount=1 IterationCount=4 LaunchCount=1
UnrollFactor=1 WarmupCount=2
Package | LineCount | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|
CsvHelper | 1000000 | 2.483 s | 0.0522 s | 0.0081 s | 33000.0000 | 18000.0000 | 3000.0000 | 261 MB |
Cursively | 1000000 | 1.732 s | 0.0586 s | 0.0091 s | 44000.0000 | 17000.0000 | 3000.0000 | 345 MB |
RecordParser | 1000000 | 2.177 s | 0.1229 s | 0.0190 s | 33000.0000 | 18000.0000 | 3000.0000 | 261 MB |
Sylvan.Data.Csv | 1000000 | 1.410 s | 0.0393 s | 0.0061 s | 33000.0000 | 18000.0000 | 3000.0000 | 261 MB |
@MarkPflug
For posterity, this is amazing: Sylvan.Data.Csv/CsvDataReader.cs (although perhaps a different version than the 1.1.6-b0001 I tested?)
Nice! That's more like what I was expecting to see! Thanks Joel!
Yes, that implementation is out-of-date, I haven't pushed the Zen2 fixed code to github yet. The ParallelBitExtract calls were the problem. I also need to clean up/document the code and turn it into something I'll be able to understand in the future. It is a bit of an ugly mess at the moment.
@JoshClose better step it up with some Hans and Frans howmuchyaBENCHmarks
@MarkPflug
Ha! Maybe Mark wants to do a pull request for me. lol
I've accepted the fact that I can't possibly get CsvHelper to these speeds due to features. When I was working on a new parser for speed I got it really fast. Then I started adding in all the flexibility features and it slowed down quickly. I've thought about doing a strict parser just for the benchmark, but Mark went a step further than I'm willing to go. 😉 Awesome job Mark.
Working with that one makes me wonder if using C# libraries in F# feels as clunky as F# libraries feel in C#.
@MarkPflug not usually, since most part of framework is written in C# then F# offers a more natural interoperability between languagues for its developers use tons of C# libraries without pain. However the opposite is not true, because C# is the dominant language.
BTW, Awesome job Mark.
Looks like I missed the last round by a couple of days, bummer. This adds a bunch of new libraries for the next update, and should close out some of the open issues.
Angara.Table from #15 FileHelpers from #13 Microsoft.ML from #5 Open.Text.CSV DSV KBCsv Microsoft.Data.Analysis
Three of these are from Microsoft, as it appears that Angara.Table is an MS research thing, and is an F# library. Working with that one makes me wonder if using C# libraries in F# feels as clunky as F# libraries feel in C#.
The FileHelpers implementation might look a bit funny as it parses to a "temporary" object (it only binds to objects, no raw access) and then that temporary is copied to the final PackageAsset. I also tried having it bind directly to the PackageAsset, but surprisingly that ended up being slower anyway.
I also updated packages to the latest versions.