mlr-org / mlr3torch

Deep learning framework for the mlr3 ecosystem based on torch
https://mlr3torch.mlr-org.com
Other
38 stars 7 forks source link

CPU Parallelisation #7

Closed jemus42 closed 2 years ago

jemus42 commented 2 years ago

See https://github.com/pytorch/pytorch/wiki/Autograd-and-Fork

I think this disqualifies future::plan("multicore"), when I tried a CV with this plan I got

Error in (function (self, inputs, gradient, retain_graph, create_graph)  :
Unable to handle autograd's threading in combination with fork-based multiprocessing. 
See https://github.com/pytorch/pytorch/wiki/Autograd-and-Fork

Testing future::plan("multisession") seems to run at least without error.

At the very least I should keep an eye on this and make sure it's documented (e.g. a vignette on resampling with {mlr3torch} in general)

jemus42 commented 2 years ago

Seems like using set.seed and torch_manual_seed in combination with num_threads = 1 behaves as expected, results (checked by predictions) are identical between two separate runs.

See https://github.com/mlr-org/mlr3torch/blob/main/attic/threading-repro.R

Addendum: It appears torch::torch_manual_seed is not required in this scenario. Neat.

jemus42 commented 2 years ago

This also affects parallelization within {batchtools}, but apparently using the SSH clusterFunc with localhost works fine, as this (if I understand correctly) works similar to future::plan(multisession).