tuner_params = nni.get_next_parameter() error #4725

Closed leiqing110 closed 2 years ago

leiqing110 commented 2 years ago

Describe the issue: When I don't write the code in the green box, it can't execute normally, and the webui shows that it takes 6 seconds to execute the successed, which is not right. But after I add the green code, the program can execute normally, I feel that there is a problem with the red code, did not get the parameter dictionary image



experimentName: NNI_test # An optional name to help you distinguish experiments.

Hyper-parameter search space can either be configured here or in a seperate file.

"config.yml" shows how to specify a seperate search space file.

The common schema of search space is documented here:

searchSpace: epochs: _type: choice _value: [200,450,600] lr: _type: uniform _value: [0.0016,0.001]

trialCommand: CUDA_VISIBLE_DEVICES=3 python3 ../imagenet/ --model efficientnet_b2 -b 256 --sched step --epochs 200 --decay-epochs 2.4 --decay-rate .97 --opt rmsproptf --opt-eps .001 -j 8 --warmup-lr 1e-6 --weight-decay 1e-5 --drop 0.3 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --reprob 0.2 --amp --lr .016 # The command to launch a trial. NOTE: change "python3" to "python" if you are using Windows. trialCodeDirectory: . # The path of trial code. By default it's ".", which means the same directory of this config file. trialGpuNumber: 1 # How many GPUs should each trial use. CUDA is required when it's greater than zero.

trialConcurrency: 1 # Run 4 trials concurrently. maxTrialNumber: 2 # Generate at most 10 trials. maxExperimentDuration: 20h # Stop generating trials after 1 hour.

tuner: # Configure the tuning algorithm. name: TPE # Supported algorithms: TPE, Random, Anneal, Evolution, GridSearch, GPTuner, PBTTuner, etc.

Full list:

classArgs: # Algorithm specific arguments. See the tuner's doc for details. optimize_mode: maximize # "minimize" or "maximize"

Configure the training platform.

Supported platforms: local, remote, openpai, aml, kubeflow, kubernetes, adl.

trainingService: platform: local useActiveGpu: false # NOTE: Use "true" if you are using an OS with graphical interface (e.g. Windows 10, Ubuntu desktop)

Reason and details:

Log message:

How to reproduce it?:

liuzhe-lz commented 2 years ago

Please attach the log files.

leiqing110 commented 2 years ago

liuzhe-lz commented 2 years ago

The search space does not contains batch_size. Hopefully you have already found it.