microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14.05k stars 1.81k forks source link

Time per epoch is worst when working with nni #5505

Closed TayyabaZainab0807 closed 1 year ago

TayyabaZainab0807 commented 1 year ago

Describe the issue: I ran a single trail with the nni tool and its 1 epoch was giving me a 41hr estimate I stopped the experiment and with the same exact setting ran a python process and it gave me 5 hr. I wonder it nni is doing something weird in the background.

Environment:

Configuration:

searchSpaceFile: search_space.json
trialCommand: python3 model.py  # NOTE: change "python3" to "python" if you are using Windows
trialGpuNumber: 1
trialConcurrency: 1

maxExperimentDuration: 156h
maxTrialNumber: 200
tuner:
  name: TPE
  classArgs:
    optimize_mode: maximize
trainingService:
  platform: local
  useActiveGpu: True
  GpuIndices: 0

}



**How to reproduce it?**:
[2023-04-05 14:16:23] INFO (mnist_example/MainThread) Hyper-parameters: {'en_decoder': 4, 'k1': 5, 'k2': 11, 'k3': 3, 'k4': 9, 'k5': 3, 'k6': 3, 'k7': 9, 'f1': 8, 'f2': 16, 'f3': 8, 'f4': 8, 'f5': 16, 'f6': 16, 'f7': 16, 'res_cnn': 1, 'res_f1': 8, 'res_f2': 16, 'res_f3': 16, 'res_k1': 5, 'res_k2': 3, 'res_k3': 5, 'res_drop1': 0.14709152555112187, 'res_drop2': 0.24284789985804922, 'res_drop3': 0.11787169251225471, 'bilstm': 2, 'u1': 8, 'u2': 16, 'drop': 0.23648714699100076, 'pu': 8, 'su': 16, 'batch_size': 80, 'epochs': 25}

with these parameters nni was giving me 41hr estimate but when i tried to train the same model out of nni, it gave me ~5 hour.
liuzhe-lz commented 1 year ago

In ~/nni-experiments/<experiment-id>/trials/<trial-id> there is a script (.sh for POSIX or .ps1 for Windows) used to run the trial. You can check what happens if you manually run the script. It looks like the trial is not using GPU to me.

Lijiaoa commented 1 year ago

any updates? @TayyabaZainab0807 @liuzhe-lz