Use standard C++ benchmarking library for atomspace microbenchmarks

vsbogd commented 6 years ago

To follow up the @linas comment: here and to not reinvent the wheel I would propose using some ready C++ benchmarking library.

Requirements to such library:

ability to keep run results
ability to compare run results
showing info on system which is used to perform test
anti-optimization tricks

Google Benchmark (https://github.com/google/benchmark) seems to be a good candidate. Unfortunately there are no ready to install packages for benchmark libraries so it will be additional manual step in building procedure.

Other well known libraries:

Celero (https://github.com/DigitalInBlue/Celero)
Nonius (https://github.com/libnonius/nonius)
Hayai (https://github.com/nickbruun/hayai)

Some review and comparision can be found here (full articles are here and here).

linas commented 6 years ago

Note that the current benchmark measures C++, python and scheme performance. The C++ side is straight-forward, the python and scheme side, not so much. For scheme, there are three distinct bottlenecks:

1) how fast can you move from C++ to guile, do nothing (no-op), and return to C++. Last I measured, this was about 15K/sec or 20K/sec for guile, and about 20K/sec to 25K/sec for cython/python.

2) Once inside guile, how fast can you do something, e.g. create atoms in a loop, that loop being written in scheme (or python). Last I measured, this was reasonably fast, no complaints.

3) How does 2) work when the scheme code is interpreted, memoized, or compiled. All three have different performance profiles. When the guile interpreter runs, nothing is computed in advance; it is interpreted "on the fly". When memoization is turned on, guile caches certain intermediate results, for faster re-use. When compiling is turned on, the scheme code is compiled into a byte-code, and then that byte-code is executed.

Historical experience is that compiling often looses: The amount of time that it takes to compile (ConceptNode "foo") into bytecode far exceeds the savings of a few cycles, compared to the interpreted version (because, hey, both the compiled and the interpreted paths immediately call C++ code, which is where the 98% of the cpu time goes).

Thus, reporting C++ performance is not bad, but the proper use and measurement of the c++/guile and the c++/python interfaces is .. tricky.

linas commented 6 years ago

Note also: it is not entirely obvious that ripping out the existing benchmark code and replacing it with something else results in a win. I mean, starting and stopping a timer, and printing the result is just .. not that hard.

The biggest problem is that the existing benchmark code is just ... messy. There's a bunch of crap done to set up the atomspace, populate it with atoms. What, exactly is a "reasonable" or "realistic" set of atoms to stick in there? How does performance vary as a function of atomspace size?

Other parts of the messiness have to do with the difficulty of measuring the c++/guile interfaces. Its not at all clear to me that just using a different microbenchmarking tool will solve any of these problems....

That said, I don't really care if or how the benchmarks are redesigned, as long as the work and are accurate (and we get a chance to do before+after measurements, to verify that any new results are in agreement with the old results)

linas commented 6 years ago

Note also: the current benchmark fails to control dynamic range. For example, we can call getArity() more than a million times a second. We perform a pattern search in about 100 times a second. The current benchmark wants to time how long it takes to do both, N times. Clearly, just one value of N cannot be used to measure both. This is one of the messy issues.

vsbogd commented 6 years ago

I tried to check python benchmarking but it seems to be broken - raised additional issue on it https://github.com/opencog/benchmark/issues/9

opencog / benchmark

Use standard C++ benchmarking library for atomspace microbenchmarks #6