Closed NicolasHug closed 3 years ago
ping @janvanrijn @mfeurer ;)
Should we add a wallclock_time_millis_training
additionally maybe which can always be computed?
The reason it is wrong for n_jobs != 1
is that internally it uses process_time
which will not count any of the subprocess time, and it's not using wall-clock time.
ping @janvanrijn @mfeurer ;)
I'll come back to you after the ICML deadline.
Thanks for raising this issue, it seems that there are indeed one or two problems here.
I believe the reason why the wallclock time is not reported if the number of cores is -1 is because we can't figure out on how many cores it was executed and the number then only makes limited sense. Currently, this is a very restrictive assumption that can be circumvented in plenty of ways (as you showed). Do you have any suggestions on how to improve on this?
Should we add a wallclock_time_millis_training additionally maybe which can always be computed?
That exists and is computed if n_jobs != -1
.
In order to get the times of each base run you can check optimization trace which should have the time for each model fit. However, we currently don't seem to store the refit time correctly (or at all?), which to me currently seems like the biggest bug here.
Do you have any suggestions on how to improve on this?
I think you can use effective_n_jobs
from joblib: https://github.com/joblib/joblib/blob/master/joblib/parallel.py#L366
Yet another issue we have to think about is the recent use of OpenMP in scikit-learn which might make it harder for us to get a useful estimate of the used time.
Sorry that this has stalled for so long, but now it's finally time to pick this up and finish it!
I think we basically have the following cases here which we need to consider:
and IIRC we can measure the following things:
That means we can do the following things for cases 1-4:
n_jobs=1
. We can still measure wallclock time as long as n_jobs>=1
as we'd know how many cores are used. In case of n_jobs==-1
we won't know how many cores are being used, but we could use effective_n_jobs
to get an estimate. CPU time is never measurable as we don't have access to the CPU time of the individual jobs.As @NicolasHug pointed out, one can override the behavior via a context manager. Another caveat is that when using a server-worker system such as dask one does not necessarily get all available CPUs or the jobs might just be in the queue, making the wallclock time of the overall run completely useless.
Therefore, I propose to do the following:
What do you think about this @NicolasHug @amueller @PGijsbers
I'd be careful not to spend too much time on this, as it will become a very complicated/impossible project on its own (we're going to have to account for different parallelization strategies/packages, but would also need to start capturing hardware information etc.). However making the proposed changes, and then clearly documenting under which conditions what is measured, and how to interpret this data, still seems like a worthwhile change to me.
We followed the suggestion of @NicolasHug to just log the CPU and wallclock time and give the user the possibility and duty to interpret those. To simplify matters we added a lengthy example.
I'm running a big benchmark suite with
RandomizedSearchCV(n_jobs=-1)
.Unfortunately, computation time is reported only if
n_jobs
isNone
or1
.I don't understand the reason https://github.com/openml/openml-python/issues/229. Why isn't the interpretation left out to the user?
As a side note:
n_jobs=None
can be overridden with a context manager:This is equivalent to just calling
RandomizedSearchCV(n_jobs=-1)
.With the latter, openml won't report computation time, but as far as I understand, the former will run just fine and report the computation time. So it seems that the check isn't properly enforced anyway.
CC @amueller