joelverhagen / NCsvPerf

A test bench for various .NET CSV parsing libraries
https://www.joelverhagen.com/blog/2020/12/fastest-net-csv-parsers
MIT License
71 stars 14 forks source link

Tweak bench.ps1 #64

Closed nietras closed 10 months ago

nietras commented 10 months ago

@joelverhagen using iterationTime to ensure fast impls get more "iterations" while keep iterations low for slow. I hope this means this is as fast as before and still reliable. Running this now locally, so will see how it looks soon.

Results on my machine are not exactly reproducible/reliable for this long running benchmarking. If running Sep only I get: image If running all: image

Note how Sep (single-threaded) is noticably slower when running all, maybe due to thermal throttling. Not sure. Do not think this is due to BDN params.

Sep is 21289/382 = 55x faster than the slowest one.

nietras commented 10 months ago

With .NET 8 and server GC perf is insane.

image

nietras commented 10 months ago

All with .NET 8 and server GC.

image

jzabroski commented 10 months ago

Good grief. I want a CsvHelper facade around Sep so that I can have the code completion goodness of CsvHelper with the Barry Allen Flash speed of Sep

JoshClose commented 10 months ago

Good grief. I want a CsvHelper facade around Sep so that I can have the code completion goodness of CsvHelper with the Barry Allen Flash speed of Sep

Man, that would be nice. A lot of the features would have to go away though. I've been thinking of making a version that would take advantages of these speed improvements, but I would probably want to do it from scratch and go .NET 7/8 forward only. If only I had some free time.

jzabroski commented 10 months ago

Maybe we can do it together. I have often wanted "SpreadsheetHelper", too. It's too damn hard using libraries like Aspose.Cells. The code always looks like crap. I just want a tagless-final API for describing how to format objects, and let a serialization layer like Sep do the hard work.

JoshClose commented 10 months ago

It would be nice to have a more generic nice to use use API on top of other implementations. It could support a lot more than CSV files. Message me if you want to talk more about it.

nietras commented 10 months ago

Thanks, I definitely see Sep as a "low-level" fast API for CSV files and imagine others could use it as a building block for more top-level things. The API fits my/my works needs. I don't need object mapping and generally don't consider it that important, you can code stuff like that in minutes and it will be faster and more flexible that way anyway. 😊 With LLMs getting faster I assume one can just ask an LLM to "map to type" going forward anyway.

https://blog.ploeh.dk/2023/12/04/serialization-with-and-without-reflection/

jzabroski commented 10 months ago

I don't think the ask is to "map to type", per say. I also don't necessarily see object mapping as serialization. Often, they feed into one another, the same way entity configuration in an ORM feeds into the object materialization layer and persistence ordering when uniquing the object graph. A generic mapping layer is the sine qua non of maintainable systems, imho. I don't think LLM spitting out Designer files as glue is going to eliminate that - they would just need to learn how to code to the generic mapping layer, still, and that still needs to be created.

I can definitely see a use for Sep without CsvHelper, too, like "wow" demoes for loading huge amounts of data. In the past I have used kdb+ (world's fastest time series database, by a large margin, used in bulge bracket finance institutions) to load CSV data and analyze it extraordinarily fast.

A fun way to describe persistence is not as orthogonal, but rather hyperbolic, borrowing from geometrist David Hilbert and specifically his infinite-dimensional Hilbert spaces over non-Euclidean geometric systems. Orthogonal persistence captures persistence as inherent to the execution environment. - You can't assign infinite details to finite points. So, Mark's blog post is worthless to me because it's looking at persistence purely from a Euclidean viewpoint (just my perspective).

joelverhagen commented 10 months ago

I've updated the blog. Sep multithreaded on server GC is indeed crazy fast. Nice work @nietras!

nietras commented 10 months ago

Awesome! And thank you. 👍

nietras commented 10 months ago

allowing the frontrunner Sep to not allocate extra for unescaping and get even greater performance

The blog mentions the above which can be misunderstood, Sep doesn't allocate when unescaping either, and is still blistering fast when unescaping, benchmarks in Sep show this. Also RecordParser doesn't do any kind of auto unescaping in the benchmark either. So it would be good if some of these passages where either removed or revised... 😊

joelverhagen commented 10 months ago

Oops! Feel free to open a PR here with wording you think is best and I'll try to work it in! https://github.com/joelverhagen/joelverhagen.com/blob/master/_posts/2020-12-08-fastest-net-csv-parsers.md Sorry for the misunderstanding.

JoshClose commented 9 months ago

Thanks, I definitely see Sep as a "low-level" fast API for CSV files and imagine others could use it as a building block for more top-level things. The API fits my/my works needs. I don't need object mapping and generally don't consider it that important, you can code stuff like that in minutes and it will be faster and more flexible that way anyway. 😊 With LLMs getting faster I assume one can just ask an LLM to "map to type" going forward anyway.

https://blog.ploeh.dk/2023/12/04/serialization-with-and-without-reflection/

Thanks for this. I'm not really up to date on the latest .NET stuff as I do only React front-end work at my day job.