why the experiment can not fully use the memory of GPU

microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

MIT License

14k stars 1.81k forks source link

Describe the issue: Hi, I am trying to using NNI to optimize the hyperparameters of a training. When I run the experiment in my Linux server, I found the trials that the experiment run cannot fully use the memory of GPU. Normally if I run the python file, it can use full memory and the training is fast, but if I run the python file through nni, the training is slow. please tell me how to fix it. Like the screenshot, the trail by nni only occupy 159MB. Screenshot 2023-05-30 at 14 15 27

Environment:

NNI version:
Training service (local|remote|pai|aml|etc):
Client OS:
Server OS (for remote mode only):
Python version:
PyTorch/TensorFlow version:
Is conda/virtualenv/venv used?:
Is running in Docker?:

Configuration:

Experiment config (remember to remove secrets!):
Search space:

Log message:

nnimanager.log:
dispatcher.log:
nnictl stdout and stderr:

How to reproduce it?:

microsoft / nni

why the experiment can not fully use the memory of GPU #5588