Benchmarks Feedback / Suggestions / Discussion

iarna / iarna-toml

Better TOML parsing and stringifying all in that familiar JSON interface.

ISC License

316 stars 43 forks source link

Benchmarks Feedback / Suggestions / Discussion #14

Closed bd82 closed 5 years ago

bd82 commented 5 years ago

Hello.

I've been working on some Toml related tooling lately and came across your benchmarks Which helped me so thanks for that! 👍

Asserting as part of the benchmark.

    fixtures.forEach(_ => {
      assertIsDeeply(parseIarnaToml(_.data), _.answer)
})

This means the time it takes to perform the deep compare is taken into account in the benchmark. In your own package's case that is ~50% of each iteration's run time.

Would it be more precise to perform these assertions outside the benchmark, or at least not every single time?

Asserting equality of output when comparing different parsers

A more general question for discussion is if the assertion should even be done. Assume someone creates a Toml Parser that outputs a data structure that also represents the comments, if I understand the "assertIsDeeply" code correctly that would fail due to additional keys (for the comments properties). But would that be a true failure?

Focusing on micro benchmarks.

It looks like there is a great deal of focus on specific parts of the parser. e.g one benchmark for literals of type X, another for nested arrays. But won't end users be more interested in real world performance? Particularly for large toml files?

See reference on V8 performance testing methodology. https://v8.dev/blog/real-world-performance

I've tried to create such a real world benchmark by collecting large TOML samples from github.com.

https://github.com/bd82/toml-tools/tree/master/packages/parser/benchmark

I am sure more samples are needed to make it more representative, maybe the best case would be to scrap 1,000 toml files from popular OSS projects for the sample instead of large files from only a few repos...

But would not the results of a more real world benchmark be more relevant, interesting and easy to understand by potential end users?

Cheers. Shahar.

iarna commented 5 years ago

Regarding the assertion, as its a price paid equally by the different parsers, it's something I was prepared to pass on, but you are right that we'd get a better baseline without it. The benchmarks started as a tool for honing development of this library, not as much to compare with others, and when that was the case it didn't matter so much.

It is absolutely a synthetic benchmark at the moment. I'd be interested in what it looks like with perf dependent TOML cases. I know that the personal use case that drove me to write this simply can't be loaded with toml, and started causing crashes in toml-j0.4, which while interesting, makes it unhelpful in bechmarking anything but this tool itself.

I'm a bit skeptical about digging through OSS projects for real world examples (the truth is, most use cases for TOML aren't perf dependent), but finding perf dependent use cases would be awesome.

(As an end user, personally I think the spec compliance is way more important.)

iarna commented 5 years ago

For funsies, I ran the file that kicked off this project through. It weighs in at 568kb and 10,000 lines. It still makes toml-j0.4 crash with a "call stack exceded" and toml grind to a crawl, though it is loading it now!

@iarna/toml: 20MB/sec (35.57 parses per second)
toml: 0MB/sec (0.07 parses per second)
@sgarciac/bombadil: 6.7MB/sec (11.68 parses per second)
@ltd/j-toml: 24MB/sec (42.56 parses per second)

So we see that for my use case @ltd/j-toml is actually a bit faster, probably due to its lightning fast string handling, which you can see in the synthetic benchmarks. (And in fact, is why I have synthetic benchmarks—to try to identify what component of the parser is working well and what isn't. @ltd/j-toml is astounding fast at the big string benchmarks.)

iarna commented 5 years ago

So, the last thing I didn't comment on is the why of the equality assertion:

And that comes down to not wanting to run benchmarks for parsers that can't correctly parse material. It doesn't matter how fast you are, if you get the wrong answer. Yes, I do have the spec compliance suite, but that'd leave us entirely unable to compare parsers as mine is the only one passing all of the suite (and essentially no one passes quite the same subset). Further, most parsers can correctly parse most TOML, and from a pragmatic stand point, it's useful to know how that plays out speed-wise.

If a tested parser starts returning something that's not equal but is in fact correct, I'll revisit how that check is done.

If a parser starts including things like comments though, it's likely in a different category entirely and probably deserves to be benchmarked separately. That is, there's currently a dearth of document orientated TOML parsers, at least in JS.

I have updated the benchmark suite to only run the assertion once, and I'll be updating the repo and the results in the docs sometime in the next week or two.

bd82 commented 5 years ago

I'm a bit skeptical about digging through OSS projects for real world examples (the truth is, most use cases for TOML aren't perf dependent).

I agree that most TOML use cases are not performance dependent.

(As an end user, personally I think the spec compliance is way more important.)

Also agree.

I ran the file that kicked off this project through. It weighs in at 568kb and 10,000 lines.

Is it possible to share this file? I could use it to benchmark my own Toml Parser.

If a parser starts including things like comments though, it's likely in a different category entirely and probably deserves to be benchmarked separately. That is, there's currently a dearth of document orientated TOML parsers, at least in JS.

Yep, there is indeed a lack, that is why I built my own parser that outputs a CST (Concrete Syntax Tree) for implementing a Toml Prettier Plugin.

I have updated the benchmark suite to only run the assertion once, and I'll be updating the repo and the results in the docs sometime in the next week or two.

Sounds good. I think that from your replies we have a few concern here regarding performance:

Demonstrating in general that iarna-toml is fast in general
doing an in-depth analysis of different parsers strengths.

Perhaps those concerns should be separated?

As a user when I open this repo's main readme I can get overwhelmed by the large performance data table, the simple user may just want basic numbers to compare between the parsers. For example: see the JavaScript Parsing Libraries benchmark I've created:

https://sap.github.io/chevrotain/performance/
3 Columns only
- NAME, OP/second, relative speed
1 scenario only (1,000 lines JSON, so it is easy to convert the ops/second to K/lines per second).

and the detailed analysis may be presented in a separate page linked from the main readme?

Cheers. Shahar.

bd82 commented 5 years ago

Closing this, thanks for discussion 😄

iarna commented 5 years ago

Yeah, any time, thanks for bringing it up! I did move my benchmarks close to the bottom of the readme (and link to a separate page as well, as the table is getting a bit unwieldy). (Edit: It actually resulted in me more generally reorganizing the readme.)

One interesting area of difference between the parsers is that some have a high constant time startup cost to initiating their parser (in particular, the Chevrotain based Bombadil) and this means it scores very poorly on very small input files, but is pretty competent on large documents. Does this matter? Depends on your use case.

I was curious if that was specific to Bombadil or a more general trait of Chevrotain and hacked together a quick JSON parser to compare to the Chevrotain benchmarks but… well, their parser is just a validating grammar, not something that produces any result, so it's not actually that comparable, alas.

bd82 commented 5 years ago

I was curious if that was specific to Bombadil or a more general trait of Chevrotain and hacked together a quick JSON parser to compare to the Chevrotain benchmarks but… well, their parser is just a validating grammar, not something that produces any result, so it's not actually that comparable, alas.

Chevrotain does indeed have a higher startup cost, but a parser instance can (and should) be re-used so that cost is mostly paid during program initialization (and some for the first inputs). So that is often irrelevant for a long running process.

The reasons for the initialization cost is that like parser combinators Chevrotain does not do any code generation, so all kinds of analysis (left recursion detection, grammar building/resolving, ...) is done at runtime. There are ways to improve the cold start performance characteristics

e.g https://github.com/SAP/chevrotain/tree/master/examples/parser/serialized_grammar and the cold start performance can probably (easily) be improved farther, but that has never been a high priority..

But thinking about it it now makes me realize that there may be low hanging fruit in that scenario So I will investigate it farther. 😄 https://github.com/SAP/chevrotain/issues/907

Cheers.

LongTengDao commented 5 years ago

If a parser starts including things like comments though, it's likely in a different category entirely and probably deserves to be benchmarked separately. That is, there's currently a dearth of document orientated TOML parsers, at least in JS.

@iarna Hi, what do you mean of "document orientated parser"?