Closed seanswyi closed 1 year ago
There should be a run.sh
or run.ps1
file in nni-experiments/<experiment-id>/trials/<trial-id>
. Could you try to run the trial with that script?
And by "run the same trials separately", did you run 5 trials concurrently or run them one by one? What's the output of nvidia-smi
?
@seanswyi any updates?
I think that there was an error in my script. I deleted the entire thing and tried again and it's working now. Sorry and thanks.
Describe the issue: I was running a script with
trial_gpu_number: 1
andtrial_concurrency: 5
. I noticed that all of my trials were failing due to CUDA out of memory errors.However, when I run the same trials separately (i.e., with the same hyperparameters but simply by doing
python ./main.py
) it works fine.Is there something that's using GPU memory that I'm not aware of?
Environment:
Configuration:
Log message:
How to reproduce it?: