Closed PGijsbers closed 1 year ago
Moreover, I was wondering whether you could also retrieve the times for 10 and 100 samples? Or is this something that doesn't appear in real-life (maybe @Innixma knows?)?
@mfeurer It does occur, but it is generally an easier scenario than batch 1 and large-batch.
10 and 100 samples will often have near identical total latency as 1 sample. It is only once you go beyond 100 (and often 1000), that you start to avoid fixed-cost overheads dominating the runtime.
Given they are passing 100 samples, they probably didn't get those 100 samples all at the same time and instead waiting to group them together. The fact that they waited to group them rather than sending them in 1 at a time is an indicator that the scenario isn't very time sensitive. This isn't always the case (for example, maybe they have to batch or else its too slow), but that describes a lot of the production deployments that would send small batches of data.
I'd be ok both with and without those 10 & 100 measurements. I think 1 and 10000+ are the most important as they are the most challenging. I'd lean towards including them for the sake of completeness.
status update (and note to self):
having issues getting the arff split files to work with the H2O integration. It gives an error that "no columns in common" even though manual checking seems to suggest that the files are fine. It seems to be related to single-row inference measurements (2+ rows work fine).
Once the last two points are resolved, I plan to merge (this and the openml PR) and start running validation tests.
help wanted: I am posting this now since it would be nice to have some feedback. I need merge this very soon in order to run experiments on time.
This PR introduces improved measurements for inference time, there are two major changes:
A few notes:
run_in_venv
). It might be nicer to generate and delete them right when doing the inference measurements in the subprocess, but in that case we need to be able to generate the splits without theOpenMLDataset
object. Future work.Current implementation’s result file after running
python runbenchmark.py FRAMEWORK -t iris -f 0
forconstantpredictor
,tpot
, andautogluon
:Notes: