An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
When I submit a trial to openpai using nni, the trialId is different with the trialId displayed in openpai. Please ignore the error of trail failed, which is caused by my program.
I note assigning environment ysqsa to trial Vj7Kl in nnimanager.log, in whichysqsa is shown in openpai, and Vj7Kl is shown in nni web page.
Environment:
NNI version: 2.5
Training service (local|remote|pai|aml|etc): openpai
Client OS:
Server OS (for remote mode only): Ubuntu 20.04.2 LTS (GNU/Linux 5.11.0-43-generic x86_64)
[2022-04-26 21:40:27] INFO (NNIDataStore) Datastore initialization done
[2022-04-26 21:40:27] INFO (RestServer) RestServer start
[2022-04-26 21:40:28] INFO (RestServer) RestServer base port is 4682
[2022-04-26 21:40:28] INFO (main) Rest server listening on: http://0.0.0.0:4682
[2022-04-26 21:40:28] INFO (NNIManager) Starting experiment: Awj8Wq6i
[2022-04-26 21:40:28] INFO (NNIManager) Setup training service...
[2022-04-26 21:40:29] INFO (NNIManager) Setup tuner...
[2022-04-26 21:40:29] INFO (NNIManager) Change NNIManager status from: INITIALIZED to: RUNNING
[2022-04-26 21:40:29] INFO (NNIManager) Add event listeners
[2022-04-26 21:40:29] INFO (TrialDispatcher) TrialDispatcher: started channel: WebCommandChannel
[2022-04-26 21:40:29] INFO (TrialDispatcher) TrialDispatcher: copying code.
[2022-04-26 21:40:54] INFO (NNIManager) NNIManager received command from dispatcher: ID,
[2022-04-26 21:40:54] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"view_n_old": 2}, "parameter_index": 0}
[2022-04-26 21:40:54] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"view_n_old": 3}, "parameter_index": 0}
[2022-04-26 21:40:54] INFO (NNIManager) NNIManager received command from dispatcher: TR, {"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"view_n_old": 4}, "parameter_index": 0}
[2022-04-26 21:40:54] INFO (NNIManager) NNIManager received command from dispatcher: NO, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": "", "parameter_index": 0}
[2022-04-26 21:40:54] INFO (NNIManager) Change NNIManager status from: RUNNING to: TUNER_NO_MORE_TRIAL
[2022-04-26 21:40:55] INFO (TrialDispatcher) Initialize environments total number: 0
[2022-04-26 21:40:55] INFO (TrialDispatcher) TrialDispatcher: run loop started.
[2022-04-26 21:40:59] INFO (NNIManager) submitTrialJob: form: {
sequenceId: 0,
hyperParameters: {
value: '{"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"view_n_old": 2}, "parameter_index": 0}',
index: 0
},
placementConstraint: { type: 'None', gpus: [] }
}
[2022-04-26 21:40:59] INFO (NNIManager) submitTrialJob: form: {
sequenceId: 1,
hyperParameters: {
value: '{"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"view_n_old": 3}, "parameter_index": 0}',
index: 0
},
placementConstraint: { type: 'None', gpus: [] }
}
[2022-04-26 21:40:59] INFO (NNIManager) submitTrialJob: form: {
sequenceId: 2,
hyperParameters: {
value: '{"parameter_id": 2, "parameter_source": "algorithm", "parameters": {"view_n_old": 4}, "parameter_index": 0}',
index: 0
},
placementConstraint: { type: 'None', gpus: [] }
}
[2022-04-26 21:40:59] INFO (TrialDispatcher) Assign environment service pai to environment ysqsa
[2022-04-26 21:41:00] INFO (TrialDispatcher) requested environment ysqsa and job id is nni_exp_Awj8Wq6i_env_ysqsa.
[2022-04-26 21:41:00] INFO (TrialDispatcher) Assign environment service pai to environment EAdup
[2022-04-26 21:41:00] INFO (TrialDispatcher) requested environment EAdup and job id is nni_exp_Awj8Wq6i_env_EAdup.
[2022-04-26 21:41:00] INFO (TrialDispatcher) Assign environment service pai to environment u7YpN
[2022-04-26 21:41:00] INFO (TrialDispatcher) requested environment u7YpN and job id is nni_exp_Awj8Wq6i_env_u7YpN.
[2022-04-26 21:41:00] INFO (TrialDispatcher) requested new environment, live trials: 3, live environments: 0, neededEnvironmentCount: 3, requestedCount: 3
[2022-04-26 21:41:01] INFO (EnvironmentInformation) EnvironmentInformation: nni_exp_Awj8Wq6i_env_ysqsa change status from UNKNOWN to WAITING.
[2022-04-26 21:41:01] INFO (EnvironmentInformation) EnvironmentInformation: nni_exp_Awj8Wq6i_env_EAdup change status from UNKNOWN to WAITING.
[2022-04-26 21:41:01] INFO (EnvironmentInformation) EnvironmentInformation: nni_exp_Awj8Wq6i_env_u7YpN change status from UNKNOWN to WAITING.
[2022-04-26 21:41:04] INFO (TrialDispatcher) TrialDispatcher: env ysqsa received initialized message and runner is ready, env status: WAITING.
[2022-04-26 21:41:04] INFO (TrialDispatcher) TrialDispatcher: Version check in trialKeeper success!
[2022-04-26 21:41:05] INFO (TrialDispatcher) TrialDispatcher: env EAdup received initialized message and runner is ready, env status: WAITING.
[2022-04-26 21:41:06] INFO (TrialDispatcher) TrialDispatcher: Version check in trialKeeper success!
[2022-04-26 21:41:06] INFO (EnvironmentInformation) EnvironmentInformation: nni_exp_Awj8Wq6i_env_EAdup change status from WAITING to RUNNING.
[2022-04-26 21:41:06] INFO (TrialDispatcher) assigning environment EAdup to trial prvak.
[2022-04-26 21:41:06] INFO (TrialDispatcher) TrialDispatcher: env u7YpN received initialized message and runner is ready, env status: WAITING.
[2022-04-26 21:41:07] INFO (TrialDispatcher) TrialDispatcher: Version check in trialKeeper success!
[2022-04-26 21:41:09] INFO (NNIManager) Trial job prvak status changed from WAITING to RUNNING
[2022-04-26 21:41:11] INFO (EnvironmentInformation) EnvironmentInformation: nni_exp_Awj8Wq6i_env_ysqsa change status from WAITING to RUNNING.
[2022-04-26 21:41:11] INFO (EnvironmentInformation) EnvironmentInformation: nni_exp_Awj8Wq6i_env_u7YpN change status from WAITING to RUNNING.
[2022-04-26 21:41:11] INFO (TrialDispatcher) assigning environment ysqsa to trial Vj7Kl.
[2022-04-26 21:41:11] INFO (TrialDispatcher) assigning environment u7YpN to trial rim7M.
[2022-04-26 21:41:14] INFO (NNIManager) Trial job Vj7Kl status changed from WAITING to RUNNING
[2022-04-26 21:41:14] INFO (NNIManager) Trial job rim7M status changed from WAITING to RUNNING
[2022-04-26 21:41:24] INFO (NNIManager) Trial job prvak status changed from RUNNING to FAILED
[2022-04-26 21:41:24] INFO (NNIManager) NNIManager received command from dispatcher: NO, {"parameter_id": 12, "parameter_source": "algorithm", "parameters": "", "parameter_index": 0}
[2022-04-26 21:41:29] INFO (NNIManager) Trial job Vj7Kl status changed from RUNNING to FAILED
[2022-04-26 21:41:29] INFO (NNIManager) Trial job rim7M status changed from RUNNING to FAILED
[2022-04-26 21:41:29] INFO (NNIManager) NNIManager received command from dispatcher: NO, {"parameter_id": 13, "parameter_source": "algorithm", "parameters": "", "parameter_index": 0}
- dispatcher.log:
[2022-04-26 21:40:29] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started
[2022-04-26 21:40:29] ERROR (nni.common.hpo_utils.validation/Thread-1) search space "combine_params" (choice) should only contain numbers or strings : OrderedDict([('_type', 'choice'), ('_value', [OrderedDict([('view_n_old', 2)]), OrderedDict([('view_n_old', 3)]), OrderedDict([('view_n_old', 4)])])])
- nnictl stdout and stderr:
<!--
Where can you find the log files:
LOG: https://github.com/microsoft/nni/blob/master/docs/en_US/Tutorial/HowToDebug.md#experiment-root-director
STDOUT/STDERR: https://github.com/microsoft/nni/blob/master/docs/en_US/Tutorial/Nnictl.md#nnictl%20log%20stdout
-->
**How to reproduce it?**:
Describe the issue:
When I submit a trial to openpai using nni, the trialId is different with the trialId displayed in openpai. Please ignore the error of trail failed, which is caused by my program. I note
assigning environment ysqsa to trial Vj7Kl
in nnimanager.log, in whichysqsa
is shown in openpai, andVj7Kl
is shown in nni web page.Environment:
Configuration:
Log message:
[2022-04-26 21:40:29] INFO (nni.runtime.msg_dispatcher_base/MainThread) Dispatcher started [2022-04-26 21:40:29] ERROR (nni.common.hpo_utils.validation/Thread-1) search space "combine_params" (choice) should only contain numbers or strings : OrderedDict([('_type', 'choice'), ('_value', [OrderedDict([('view_n_old', 2)]), OrderedDict([('view_n_old', 3)]), OrderedDict([('view_n_old', 4)])])])