jquery / esprima

ECMAScript parsing infrastructure for multipurpose analysis
http://esprima.org
BSD 2-Clause "Simplified" License
7.05k stars 787 forks source link

[Speed Comparison] Avoiding JIT warmup #1860

Closed twiss closed 6 years ago

twiss commented 7 years ago

So, I was adding the new Cherow parser to the comparison benchmark, and I noticed that the results didn't match up with my unscientific benchmark "in the wild", which turned out much slower. I believe this is because the benchmark calls the parser many many times, giving the JIT compiler a lot of opportunity to optimize the code, while I only parse a few scripts, and all of them only once.

So, I modified the benchmark to create a new Web Worker for every benchmark cycle, in an attempt to make the code run in a fresh instance of the JIT compiler. (While still only measuring the parse time.) And sure enough, the parse times went up by around 2x, and closer together.

Do you think this is a fair(er) benchmark (or do you think warm JIT is more realistic for most use cases)? (We could also have both, of course.) It might also be useful for the (non-comparison) benchmark suite.

You can see the (still rough) code here, and run the benchmark here.

(More discussion on Cherow's issue tracker)

ariya commented 7 years ago

Hi @twiss,

Thanks for looking into this. The Web Workers approach is definitely interesting, feel free to submit that as a pull request and I'd be happy to review it!

As for the differences in the result, it certainly requires a more in-depth analysis. Based on my experience, I highly recommend separating between factual observations (e.g. differences in elapsed execution time) and assumptions (e.g. this is caused by a certain JIT compiler behavior). Taken out of context, mixing the two can result in a lot of future misunderstanding.

Also, the assumption of that JIT compiler behavior needs to be validated anyway. This can be done by looking at the graph and generated code, particularly interesting if it is V8's TurboFan. There could be other reasons as well, e.g. perhaps it is because of the garbage collection?

Please be careful with your particular use of cold JIT vs warm JIT. IMO, it has nothing do with running in a separate thread (via Web Workers) or not. In the compiler field, warm is usually associated with the JIT compiler getting enough work (e.g. due to intensive loops) to start switching to a different optimization mode. A single-threaded JIT compiler can get warm/hot enough after doing a particular processing over and over again.

Instead of distinguishing by cold and warm JIT, I suggest identifying the two different modes as the main browser thread vs its own isolated thread (due to Web Workers).

Looking forward to reviewing your PR. Thank you!

twiss commented 7 years ago

Hi @ariya, thanks for the response.

Please be careful with your particular use of cold JIT vs warm JIT. IMO, it has nothing do with running in a separate thread (via Web Workers) or not. In the compiler field, warm is usually associated with the JIT compiler getting enough work (e.g. due to intensive loops) to start switching to a different optimization mode. A single-threaded JIT compiler can get warm/hot enough after doing a particular processing over and over again.

Yes. this is exactly what I mean by it as well. The single-threaded benchmark I refer to as "warm", and the "cold" benchmark starts a new thread (Web Worker) for every call to parse() in an attempt to get a new, and cold, JIT compiler. (It doesn't run the entire benchmark in one thread.)

As for the differences in the result, it certainly requires a more in-depth analysis. Based on my experience, I highly recommend separating between factual observations (e.g. differences in elapsed execution time) and assumptions (e.g. this is caused by a certain JIT compiler behavior). Taken out of context, mixing the two can result in a lot of future misunderstanding.

Good point. In fact, it's theoretically possible for a JIT compiler to optimize across threads, in which case the benchmark would be broken since it no longer tests a "cold JIT".

On the other hand, if all the page says is "Run in a single thread" / "Run each parse in a new thread", it might be unclear to some people why you would want to do that. Do you feel that adding an explanation of what the benchmark is supposed to simulate (parsing a file for the 100th time vs parsing it for the 1st time) is appropriate, or would you rather leave that question to someone else to answer?

ariya commented 7 years ago

Yes. this is exactly what I mean by it as well. The single-threaded benchmark I refer to as "warm", and the "cold" benchmark starts a new thread (Web Worker) for every call to parse() in an attempt to get a new, and cold, JIT compiler. (It doesn't run the entire benchmark in one thread.)

(This is what I meant earlier by separating between what is being done vs the assumption of what happens.)

I have a reservation denoting that as cold JIT. In reality, running a single function for the first time likely does not even cause a JIT compiler to kick in. This applies to modern JS engines using tiered execution approach, from JavaScriptCore to V8 (see e.g. this article). Thus, very likely that what is being measured is the execution time of that function in the pure interpreter mode of the JS engine. No JIT compilation is involved.

On the other hand, if all the page says is "Run in a single thread" / "Run each parse in a new thread", it might be unclear to some people why you would want to do that. Do you feel that adding an explanation of what the benchmark is supposed to simulate (parsing a file for the 100th time vs parsing it for the 1st time) is appropriate, or would you rather leave that question to someone else to answer?

Here is a suggestion: just make it a check box that says something along the line of

It's factual (describing exactly what the code does) and it doesn't add any assumption or interpretation.

ariya commented 6 years ago

@twiss I'm closing this issue for now. Feel free to reopen if you think you want to submit a pull request.

In the mean time, I still observe benchmarks utilizing such a "cold JIT" phrase. I highly recommend consulting anyone who is involved with JavaScript engine development and ask their opinion. AFAIK there is still no JavaScript engine which runs its JIT compiler right away when executing a function for the first time, hence creating a confusion around the term "cold JIT" (as the JIT compiler does not even kick in).

twiss commented 6 years ago

Right, I forgot about the interpreter stage.

Run each parse in a separate Web Worker

I've sent a PR including your suggestion. However, I still feel that it will be lost on many people why you would want to do this. The description makes it sound like it may have to do something with Web Workers. They might think they should turn the checkbox off because they're not using those. Or, when they observe that with this box checked, the measured times double, they might assume something like "well, maybe Web Workers have some sort of overhead" and again think they should turn it off.

In reality, the Web Workers are just an implementation detail of "separate threads", which in turn is just an implementation detail of "without-an-initially-hot-JIT" which is an implementation detail of "parsing for the first time". There are other possible implementations, e.g. creating iframes instead of Web Workers. I'm not saying that we shouldn't mention Web Workers, just that we should also mention what it's simulating. Something like:

When the checkbox above is checked, we start a new thread (Web Worker) for every benchmark cycle, to simulate parsing a single file for the first time.

When the checkbox above is not checked, all benchmark cycles are run in the same thread. This allows the JIT compiler to heavily optimize the parsers, simulating parsing the files many times.

ariya commented 6 years ago

@twiss I'm fine with the elaboration of the meaning of the checkbox. I think it goes a long way of explaining it, and that's a good thing 👍

Minor nitpick: instead of the active "we start..." form, better use the same passive style as the second paragraph ("all benchmark cycles are run..."). We (as a pronoun) is not effective in a documentation, it creates a lot of ambiguity w.r.t who is being referred to.

twiss commented 6 years ago

Minor nitpick: instead of the active "we start..." form, better use the same passive style as the second paragraph ("all benchmark cycles are run..."). We (as a pronoun) is not effective in a documentation, it creates a lot of ambiguity w.r.t who is being referred to.

Alright 👍 I've modified and added the explanation.