microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14k stars 1.81k forks source link

why the experiment can not fully use the memory of GPU #5588

Open yangqiao33 opened 1 year ago

yangqiao33 commented 1 year ago

Describe the issue: Hi, I am trying to using NNI to optimize the hyperparameters of a training. When I run the experiment in my Linux server, I found the trials that the experiment run cannot fully use the memory of GPU. Normally if I run the python file, it can use full memory and the training is fast, but if I run the python file through nni, the training is slow. please tell me how to fix it. Like the screenshot, the trail by nni only occupy 159MB. Screenshot 2023-05-30 at 14 15 27

Environment:

Configuration:

Log message:

How to reproduce it?:

XDZxdz1 commented 1 year ago

Hello, I am also trying to use NNI to optimize the training hyperparameters. When I run NNI on the ubantu server, it always shows "failed". Then, I check the log file and it says "GPU context requested, but no GPUs found.". How can I solve this problem? I look forward to your reply very much, and I would greatly appreciate it.