Optionally limit inference time measurements by dataset size

Measuring inference time on very wide datasets is problematic if the batch size is sufficiently big. This stems from two issues:

Sometimes the dataset will get to be too big to fit in memory (possibly after preprocessing), and
the automl framework is unaware of the constraint to need to predict on batches of data greater than the training data itself. This isn't a problem in and of itself, but may lead to exceeding total allowed job time which leads no results and a waste of compute time.

This PR makes it so that it is possible to only measure inference time on batches that do not exceed the initial dataset size. The first issue should be addressed if this option is turned on in almost all cases, as the batch size should never exceed the dataset size. The second issue may remain, but will be less likely as the maximum batch size is now proportional to the training data the automl framework already evaluated their models on.

This seems like a fair compromise, as it seems reasonable to assume that the inference batches in practice will not be (significantly) greater than the training dataset.

It also lowers the default number of repeats from 100 to 10. From my own (admittedly small scale) experiments, the variance and cold-start effect isn't big enough to need more than 10 measurements to filter out.

openml / automlbenchmark

Optionally limit inference time measurements by dataset size #538