Updated Statistical Tests with RooFit

vincecr0ft commented 3 years ago

It seems the RooFit test scripts currently included in rootbench don't quite give the depth and complexity required for benchmarking performance of experimental updates (such as the multiprocess PR and upcoming CUDA implementations) is it possible to keep a continuous benchmarking suite with specific tests for high performance development work separate?

Currently the 'binned' example creates between 1 and 3 histfactory channels with between 1 and 4 bins each. I believe this was tuned to get running time for the whole test down to under 20minutes. Is there an option to 'stress test' by returning this to its orginal 20 x 40 dimensions?

The unbinned example creates a simple B-decay example indicative of LHCb style fits however this seems fairly trivial in comparison. Ideas include more complex functions, convolutions, mulidimentional fits, ranges, or even a Dalitz fit (that I think includes all previous suggestions)

A mixed case might also be desirable with a combined binned and unbinned fit, to simulate

@lmoneta @oshadura @hageboeck @guitargeek

eguiraud commented 3 years ago

Tangentially related, and I am not familiar with RooFit's benchmarks, but: it must surely be possible to benchmark realistic workloads and still run short benchmarks, e.g. by fixing the number of iterations? 20 minutes sounds...excessive.

vincecr0ft commented 3 years ago

This message is currently left open on my terminal...

All fits done in 1381.61 min (cpu), 1382.38 min (real)

Excessive yes but we're talking improvements that improve operations time from days to minutes.

Also we've already set the number of iterations to 1...

eguiraud commented 3 years ago

Also we've already set the number of iterations to 1...

I guess you refer to the googlebench iterations, I meant reducing iterations of the fitting algorithm.

I would expect O(days) -> O(minutes) can be scaled down to O(minutes) -> O(seconds) without losing the ability to see the improvements brought by new developments. But again, just a thought in case someone will look into these benchmarks soon, if there is some inherent reason why it makes no sense to have a benchmark that takes less than 10 minutes then that's it.

vincecr0ft commented 3 years ago

I think this discussion distils my original question. We have short timescale tests. We'd like to run long timescale tests but don't want them to screw up the continuous testing. Is this the right place to add such tests? if not here then where? and if not here then do we want to mirror the additional tests mentioned at short timescales in order to be added in here?

lmoneta commented 3 years ago

Hi, I think the best solution is to add a parameter to the test command line. In this was the continuos testing uses the short version and we can use the long version to make realistic benchmark that we want to use to study and report the different cases (e.g. Scalar vs Vectorisation vs GPU)

root-project / rootbench

Updated Statistical Tests with RooFit #226