Benchmark more things - Githubissues

smarter commented 7 years ago

Here are some suggestions of things we could add to our benchmarks:

[ ] Benchmark against scalac
[ ] Benchmark against a non-bootstrapped dotty
[x] Benchmark against an optimized dotty
[x] Benchmark more code, like all the projects in the https://github.com/lampepfl/dotty-community-build
[ ] Benchmark re-using the same compiler instance like in https://github.com/scala/compiler-benchmark/pull/39
[x] Compare the results we get from https://github.com/liufengyun/bench and https://github.com/scala/compiler-benchmark to get more confidence that we're benchmarking the correct thing.

liufengyun commented 7 years ago

Thanks for making the list @smarter . I see some items are actionable, but other need more thinking about how to accommodate them without breaking the maintainability of the bench infrastructure.

The new bench project adopts a data-centered design, where the whole system is designed around the CSV file. This design enables us to easily develop features like, PR trackability, open PR testing, as well as make the web UI project-agnostic. That's the reason why I didn't use the scala/compiler-benchmark project, which is not easy to customise to support the features we want.

The experience with the previous benchmark infrastructure made me realise that maintainability is a major issue with compiler bench infrastructure. Thus,while I'm open to new features, I'd argue that any new feature should be developed based on the same design philosophy without sacrificing maintainability.

Some detailed feedback regarding the items:

Benchmark against scalac

What does this exactly mean? Which version to benchmark? Do we show a line for it? What to do with test cases that don't compile with scalac?

Benchmark against a non-bootstrapped dotty; Benchmark against an optimized dotty

These two require changes to the CSV file to allow multiple lines in a chart. A potential problem here is the misalignment of points. I need to investigate more to evaluate the scope of changes required and its implication in maintainability.

Benchmark all the projects in the community build

A concern here is that the community build breaks from time to time, it we add them to the benchmarks, we will have to disable the community build in bench from time to time. But if that's acceptable, then it's not a problem.

Benchmark re-using the same compiler instance

Could you please detail what does this mean for Dotty? Does it mean reuse of the context?

Compare the results from liufengyun/bench and scala/compiler-benchmark to get more confidence that we're benchmarking the correct thing

Maybe I misunderstood, as we are testing Dotty, while the other project is testing Scalac, and they are on different machines, it's not easy to draw meaningful conclusion from the comparison.

smarter commented 7 years ago

What does this exactly mean? Which version to benchmark? Do we show a line for it? What to do with test cases that don't compile with scalac?

Good questions! :) I don't have any strong opinion here. I think it'd be interesting to just run Scala 2.12.3 once on whatever test cases we can get it to work on and display that. This way we have some idea of how much better or worse we are.

A concern here is that the community build breaks from time to time, it we add them to the benchmarks, we will have to disable the community build in bench from time to time. But if that's acceptable, then it's not a problem.

I think that's OK.

Benchmark re-using the same compiler instance Could you please detail what does this mean for Dotty? Does it mean reuse of the context?

Yes, re-using the same root context like we do in the REPL and the IDE.

Maybe I misunderstood, as we are testing Dotty, while the other project is testing Scalac, and they are on different machines, it's not easy to draw meaningful conclusion from the comparison.

The other project now also supports dotty: https://github.com/scala/compiler-benchmark/pull/31 . And apparently dotty does pretty badly there (Jason said "scalac compile times are is 0.65x that of 0.3.0-RC1"), so it's worth seeing if we can reproduce their results.

smarter commented 6 years ago

I think it'd be interesting to benchmark compiling re2s too. It's the project used for benchmarking rsc: https://github.com/twitter/reasonable-scala/tree/performance I have a branch of re2s that compiles with Dotty at https://github.com/smarter/reasonable-scala/commits/dotty-re2s (code is in https://github.com/smarter/reasonable-scala/tree/dotty-re2s/examples/re2s)

liufengyun commented 6 years ago

Now we have benchmarks for optimised dotty. In the time mode, the related lines(optimised, bootstrapped) are shown in the same chart.

There's a switch in the sidebar of the UI to switch between commit and time mode.

http://dotty-bench.epfl.ch/

lampepfl / dotty-feature-requests

Benchmark more things #44