Benchmark lib - Githubissues

TTitscher commented 7 years ago

We currently use a handwritten benchmark class. I used it as a training to implement self registered methods that do not require a main() method in the *.cpp file. Have a look, if you're interested.

But it has obvious limitations:

1) Handwritten. By an amateur. Pure intuition. No theoretical benchmark background. 2) Kind of unstable results. High inconsitant standard deviation 3) Suffers (maybe, as I said, not an expert) from undesired compiler optimizations 4) Missing features, most important IMO to only run a subset of benchmarks from the CLI without recompiling

Here is a nice overview of popular libs: http://www.bfilipek.com/2016/01/micro-benchmarking-libraries-for-c.html

Any opinions? Google Benchmark maybe?

Psirus commented 7 years ago

I'd vote for Google benchmark, simply because it seems the most widely used. The output of the library should be compatible with the Jenkins Performance Plugin, so that we can track performance over time. A proof of concept can be found here. It seems all of the mentioned libs it the article provide some sort of output, but I'm not sure which is compatible. Sadly, none of them seem to be in the Debian repos.

TTitscher commented 7 years ago

Since the number of benchmarks increases, we should really find a way to include it in our CI. Do we need another server for that? Or just run it on sv2212 and hope that one core is idle?

TTitscher commented 7 years ago

Alternative lib: https://github.com/ivafanas/sltbench/blob/master/README.md

Psirus commented 6 years ago

I've just noticed that the boost test library also produces JUnit output (compatible with Jenkins). Just run your test with myUnitTest -f JUNIT. So if we just want to run regression testing on, say, all integration tests, we wouldn't need another library. Of course this is not suitable for microbenchmarking, standard deviations etc, but just fine for automated regression testing.

vhirtham commented 6 years ago

Just made a new branch (googleBenchmark) that implements google benchmark via git submodules. I also wrote some setup scripts, because submodules are not cloned out of the box. The library is compiled together with the nuto libs.

I added a benchmark that uses this lib as example (took one from the documentation) . All you need to do is to link "benchmark" to your file and include the header. Now you can use it.

However, so far there is one minor ugliness that needs to be fixed. At the moment the shell script is called during each cmake run, which updates the libraries to the latest commit and afterwards goes back to version 1.3.0. Hence the files change and are recompiled after each cmake run. Should not be to hard to fix. Just wan to mention it.

EDIT: Well, Travis build fails due to some library specific warnings (they didn't use override). Need to turn this warning off during library creation.

TTitscher commented 6 years ago

Maybe this whole micro benchmarking is a bit over the top. Especially this dependency on a new library is a bit annoying. I opened this issue, I know, but apparently, I changed my opinion. We currently use benchmarks mainly to check, if some algorithm is waaay to slow. So, from the list in the first post, I propose adding a CLI option to our Benchmark.h to select the benchmarks that are actually run and be done with it. @vhirtham Nice that you got that running. And I know you have some side projects that will use this code. So - whether we decide to keep it in NuTo or not - your efforts are not wasted :) @Psirus It depends on what conclusions we draw. An integration test may test various setup methods that may slow down but are still insignificant compared to a full simulation. Some red flags for those may be misleading. In dedicated benchmarks, we can measure a real setup.

[PS: I adore the google benchmark feature of finding out the complexity O(...) of a piece of code. But this looks like a neat weekend project...]

Psirus commented 6 years ago

Thanks for getting started on this, even if @TTitscher suddenly doesn't like it anymore. A couple of notes:

You had the right instinct in always checking out a tagged commit, in order to avoid this external dependency being a moving target. Fortunately, the git developers agree, and so this is what happens by defaut. If you look at your commit, it says Submodule benchmark added at 336bb8. Everyone that clones nuto will always get the benchmark at this commit, which is the commit of v.1.3.0, so you're good. No need for this script :)

Lets just add a line to the README that says something like

If you're making a new clone of NuTo, use git clone --recursive.

For those who want to keep their old tree, they can use git submodule init && git submodule update.

If you say add_subdirectory(external/benchmark EXCLUDE_FROM_ALL), the benchmark lib will no longer be included in the default target.

As for the error, take a look the python/nuto/CMakeLists.txt, where we set different flags for this subdirectory. This could be adapted to deactivate the warning for the lib.

Personally, I don't care whether we use google or our own.

vhirtham commented 6 years ago

Okay, I removed the shell script, updated the README and hopefully fixed the warnings issue.

In general it won't hurt my feelings, if we decide against the lib. Okay... maybe a bit :stuck_out_tongue:

BUT I think there are some things to consider:

First, the library dependency. Why is that bad in this particular case? The submodule stuff is 'from the outside' practically the same as copying a header-only lib into our project. A user does not need to install anything more. The lib comes together with NuTo and problems caused by updates in the lib will only occur if we change the version number by ourself.

Then there is the question: Why would we build our own benchmarking stuff (except for the fun of doing that)? Do we need a feature that is not in the lib? Can we do it better? At least the latter question can be answered with: Not in an acceptable amount of time. Good benchmarking is a little bit more than just setting 2 timers and get the difference. Read the comments in the article. There are a lot of things you probably did not think of.

Here are some features that I think that should be mentioned (and that I know of so far. There are probably more to be noteworthy):

Automatic iteration adjustment to the benchmarked piece of code instead of fixed numbers which might not be sufficient in some cases
Suppression of specific compiler optimizations with a single instruction
Warnings if the test environment is not optimal. Debug mode is an obvious one, but on my laptop I got an additional warning because the operating system is allowed to scale down the frequency in order to save power, which heavily effects the result!
nice output of the results
multithreaded (fast) benchmarking

To sum it up: I think we have to get used to use benchmarks more frequently. We surely don't need a benchmark for every law, but critical pieces of code that are often used and tend to be touched by someone should get a benchmark. Don't know how we can create an automated result history on Travis, but there probably is a way. Because the integration of this lib turned out to be rather easy, I would advise against coming up with a home-brew version and say we use google benchmark instead.

TTitscher commented 6 years ago

I am totally aware that GoogleBenchmark is a great lib. I was not sure on how to include it to NuTo. Since now, it is only one merge away, I see no reason in not using it. I just thought that the "critical pieces" of our code are in external libs (Eigen, Mumps, Pardiso). Everything else could literally be measured with a stopwatch. (If you have a look at their docu, they benchmark things like memcpy and string1 < string2 - and we don't want to go to that level.)

So again, create a PR and we can finish this issue.

How can we actually run these benchmarks consistently? Jenkins on an internal server? And what do we want to benchmark in the future?

vhirtham commented 6 years ago

I just configured the options to our needs. So the tests and install stuff of google benchmark is not build anymore.

Last step before I issue a pull request would be to create a macro to generate benchmarks. I would take the unit test macro as starting point and adjust it. Maybe we can give benchmarks an own option and run them separately on Travis. I ll check if there is the possibility to upload the results like we do it with codecov.

What to benchmark? I think first we should get to know, what we can actually do with the benchmark lib. I ll make a presentation for that additionally to the CMake presentation.

Psirus commented 6 years ago

Go ahead and open a PR now, code is easier and more appropriately discussed there.

I don't think we will be able to run the benchmarks on Travis. I mean we can run them, but the results will be meaningless, since the runtime already fluctuates highly, probably depending on how much load there is on their servers and what machine you get assigned to and so on. Also, the time it takes for a Travis built is already quite high; I'd rather see it go down than up :) So automated benchmarking should happen on our own infrastructure. Maybe @joergfunger can tell you more?

joergfunger commented 6 years ago

We have applied for two VM, one for testing and one for computations. The first one is currently not installed, but if we can specify exactly what we want, this could be done internally. The only thing I'm not fully sure is, if we will be able to automatically trigger a benchmark test on every commit via a webhook (though a regular cronjob would certainly be sufficient), and probably more important, if we are able to somehow able feed back the benchmark results to github (due to the firewall settings).

nutofem / nuto

Benchmark lib #42