Open valtterivalo opened 1 year ago
Strange, but it resembles a case I had in multi GPU. See this, might help...
Strange, but it resembles a case I had in multi GPU. See this, might help...
It was indeed an issue with Multi GPU, lacking notebook support is something that has slipped past me in the documentation. One GPU works fine.
Perhaps it's worth adding in the error message a pointer that the user might be trying to run multiple GPUs in a notebook environment?
Describe the bug When trying to use gpu as the accelerator on Azure Databricks, Lightning runs into a runtime error:
RuntimeError: Lightning can't create new processes if CUDA is already initialized. Did you manually call torch.cuda.* functions, have moved the model to the device, or allocated memory on the GPU any other way? Please remove any such calls, or change the selected strategy. You will have to restart the Python kernel.
To Reproduce The bug can be reproduced by using any example notebook from Darts documentation by passing pl_trainer_kwargs = {'accelerator': 'gpu'} when the cluster has GPUs available. I'm personally using the N-BEATS example (although in my particular case I'm using N-HiTS - N-BEATS runs into the same error as it is not model specific) , and the following code cell triggers the error:
Expected behavior The training process is expected to run like normal.
System (please complete the following information):
Additional context It seems like a Lightning issue to be fair, not necessarily a Darts one.