microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
13.99k stars 1.81k forks source link

NNI error while enable Assessor CurveFitting #4797

Open Simba98 opened 2 years ago

Simba98 commented 2 years ago

Describe the issue:

The Assessor CurveFitting Throw an Error while try to run custom trail.

Environment:

Configuration:

tuner: name: MetisTuner classArgs: optimize_mode: maximize

assessor: name: Curvefitting classArgs: epoch_num: 90 start_step: 5 threshold: 0.9 gap: 1

trainingService: platform: remote machineList:

Log message:

How to reproduce it?:

liuzhe-lz commented 2 years ago

Curve fitting only accepts accuracy metrics. The result value cannot be negative.

Simba98 commented 2 years ago

Curve fitting only accepts accuracy metrics. The result value cannot be negative.

The metrics are accuracies. However, perhaps Metis Tuner makes it negative? As you can see,

tuner:
name: MetisTuner
classArgs:
optimize_mode: maximize

How to migrate these existing trainings to a new experiment? Some tuners don't accept manual trails. Are manual trails meaningful for the metis tuner?

liuzhe-lz commented 2 years ago

Oh, sorry, the log's value is inversed by tuner. Though curve fitting has reported a lot of warnings, the real error is raised by GPU scheduler. I'll look into that.

liuzhe-lz commented 2 years ago

I'm currently unsure if we will make a patch release. You can fix it locally by editting /usr/local/lib/python3.8/dist-packages/nni_node/training_service/reusable/gpuScheduler.js line 31. Add a line: constraint = constraint ?? {type: 'None', gpus: []}

Simba98 commented 2 years ago

I'm currently unsure if we will make a patch release. You can fix it locally by editting /usr/local/lib/python3.8/dist-packages/nni_node/training_service/reusable/gpuScheduler.js line 31. Add a line: constraint = constraint ?? {type: 'None', gpus: []}

Thank you for your reply. I will try it if this fixes the issue.