lukehutch / pikaparser

The Pika Parser reference implementation
MIT License
141 stars 12 forks source link

Increase usage of StringBuilder objects #26

Closed elfring closed 4 years ago

elfring commented 4 years ago

I have noticed a few source code places where it could be more efficient to use a StringBuilder object for string concatenations.

Update candidates:

lukehutch commented 4 years ago

Actually the string concatenation operator + is highly optimized by the compiler (in fact I think it is compiled to use StringBuilder internaly), such that in cases like this, it's usually not actually faster to use StringBuilder, and if StringBuilder is faster, it's only marginally faster. See the "foo" + "bar" line here:

https://javapapers.com/java/java-string-vs-stringbuilder-vs-stringbuffer-concatenation-performance-micro-benchmark/

or Scenario 1 here:

https://redfin.engineering/java-string-concatenation-which-way-is-best-8f590a7d22a8

Also, always remember that premature optimization is the root of all evil (meaning the root of all complexity). You have to have a really good reason to make code less readable in the name of performance optimization.

In this case, the execution of these methods comprises only a tiny fraction of the total execution time of the parser (in the second case, as you can see the result is cached so it's even more efficient if the value is used more than once). Therefore it's not worth rewriting this to use StringBuilder.

elfring commented 4 years ago

You have to have a really good reason to make code less readable in the name of performance optimization.

I would follow advice from a known information source. You are using StringBuilder objects because of a corresponding reason at other source code places already.

elfring commented 4 years ago

Would you find any adjustments more helpful at special source code places according to the mentioned transformation pattern?

Another update candidate: https://github.com/lukehutch/pikaparser/blob/94afa013d8b82526a1e1b1b527c0149862fef8e6/src/main/java/pikaparser/grammar/MetaGrammar.java#L406

lukehutch commented 4 years ago

You have to have a really good reason to make code less readable in the name of performance optimization.

I would follow advice from a known information source.

I linked exactly the same article to show you that StringBuilder doesn't make things much faster than just using +. In fact under the hood the compiler turns + into StringBuilder code, or at least it used to (I don't know what the modern JDK does). And using StringBuilder when you have a fixed number of things to concatenate makes the code a lot uglier.

You are using StringBuilder objects because of a corresponding reason at other source code places already.

I use StringBuilder pretty much where I need to use it: this is when there are a large number of things to concatenate, and/or an unknown number of things. If you have 3 things to concatenate, it is always better to just use +: a + "," + b -- because the code is so much easier to read, and turning this manually into StringBuilder code is pointless, since the compiler will do that for you under the hood anyway.

Would you find any adjustments more helpful at special source code places according to the mentioned transformation pattern?

Another update candidate: https://github.com/lukehutch/pikaparser/blob/94afa013d8b82526a1e1b1b527c0149862fef8e6/src/main/java/pikaparser/grammar/MetaGrammar.java#L406

See above. Additionally, please learn the following principle: "premature optimization is the root of all evil". If you are suggesting this, it can't possibly be for legibility reasons, so you must be suggesting it for optimality reasons. Please learn that:

  1. Almost all performance optimizations, and many memory optimizations, result in code that is less legible. Code legibility does actually matter.
  2. It is absolutely not worth optimizing code that is only called once or a small number of times, and only takes some number of microseconds to execute. Write for code legibility first and foremost.
  3. If you suspect that code in a performance hotspot can and should be optimized for performance, then you must benchmark it before you decide to optimize it. In at least half of cases, the code probably doesn't have the impact you think it does, and there is something else that has a much larger impact on performance. Simply suggesting that something should be optimized when you haven't actually profiled it or benchmarked it, and/or when you haven't pointed out exactly why it's worth optimizing (e.g. "this same piece of work is wastefully repeated N times for input length N", or "This piece of work is done N^2 times for input length N, and there's a better O(N log N) algorithm") is simply a waste of focus.
  4. If code really is a hotspot and optimizing it really does make the code significantly faster, even at the expense of code clarity, then by all means, please optimize it -- and please submit a pull request for your suggested change, along with benchmarks that show that the loss of clarity is totally worth the optimization.

To summarize: Clarity should ALWAYS be prioritized over optimization, unless the optimization is significant -- and the only way to know if it is significant is to reason through the computational complexity (big-Oh notation and input size distribution), or to benchmark, or to profile.

elfring commented 4 years ago

I linked exactly the same article

I reused one of your links for my response.

to show you that StringBuilder doesn't make things much faster than just using +.

Differences should be considered according to desirable run time characteristics also for your software.

In fact under the hood the compiler turns + into StringBuilder code, or at least it used to (I don't know what the modern JDK does).

I suggest to take another look at information from original source files.

And using StringBuilder when you have a fixed number of things to concatenate makes the code a lot uglier.

I propose to reconsider such coding style concerns if you would like to achieve a bit better execution speed at additional source code places.

…, and turning this manually into StringBuilder code is pointless, since the compiler will do that for you under the hood anyway.

Our imaginations are different about the amount of code optimisations which will be performed eventually.

Additionally, please learn the following principle: "premature optimization is the root of all evil".

I would prefer to apply available programming interfaces in an efficient and consistent way for specific use cases.

lukehutch commented 4 years ago

On Thu, Oct 8, 2020 at 4:10 AM Markus Elfring notifications@github.com wrote:

I suggest to take another look at information from original source files https://github.com/openjdk/jdk/blob/2a406f3ce5e200af9909ce051fdeed0cc059fea0/src/java.base/share/classes/java/lang/String.java#L113 .

Yes, this is obviously implementation-dependent -- that's why I said I don't know what the current JDK does. However you can absolutely bet that later and later JDK versions will not be slower than the way it has been done for many years, which is to turn + into StringBuilder code.

And using StringBuilder when you have a fixed number of things to concatenate makes the code a lot uglier.

I propose to reconsider such coding style concerns if you would like to achieve a bit better execution speed at additional source code places.

I direct you again to my many comments about premature optimization. Your statement is not true in the general case.

…, and turning this manually into StringBuilder code is pointless, since the compiler will do that for you under the hood anyway.

Our imaginations are different about the amount of code optimisations which will be performed eventually.

As I have said, if you have fundamentally different ideas than mine and you don't agree with my design decisions, please fork the code and create your own version. You can modify it as much as you would like.

elfring commented 4 years ago

…, if you have fundamentally different ideas

My ideas are probably not fundamentally different for this issue so far.

than mine and you don't agree with my design decisions, …

They are more extensions for a software development result which you categorise as a “research prototype and proof of concept”.

:crystal_ball: How many efforts should be invested in remaining update candidates (and open issues)?

lukehutch commented 4 years ago

I'll entertain any suggestions you want to send, but I can tell you that not just for this project but the entire open source world, code submissions (pull requests), real-world testcases, and actual benchmarks / profiling results, speak 1000x louder than words.