time-series-machine-learning / tsml-eval

Evaluation tools for time series machine learning algorithms.
https://tsml-eval.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
32 stars 14 forks source link

Discussion on threading HC2 and components #21

Open TonyBagnall opened 1 year ago

TonyBagnall commented 1 year ago

triggered by this https://github.com/sktime/sktime/issues/3788 its worthwhile discussing how we thread prior to proposing improvements.

HC2 seems to do each component sequentially with all the jobs (n_jobs converted in _threads_to_use in BaseClassifier) , in the order STC DrCIF Arsenal TDE

initial observations

  1. I'm not sure how hard it would be to put each component in its own thread, with n_jobs/4 each, but if n_jobs > 4 it might be preferable. Even better would to run a thread pool, since some components will take longer, but its probably hideous.
  2. 3788 is an interaction between n_jobs and contract time. Contract time is split evenly between classifiers, if we did (1) we could give them the same time
  3. Parallel in python is weird, not sure it is actually creating threads or just time slicing by default

I'm going to poke around a bit with n_jobs, see what actual speed up we are getting

MatthewMiddlehurst commented 1 year ago

Was this fixed with the swap to prefer="threads? May be worth opening threading issue on the main repo or closing.