Closed JianJuly closed 3 years ago
Hello @JianJuly , will this #3495 fix your issue for a workaround? If this does not work, please tell us. view
experiment launched by python will be fully supported in nni v2.2.
I found n6XH4aFU
metadata in my ~/nni-experiments/.experiment
path, it looks like this
"0QEwk2bg": {
"id": "0QEwk2bg",
"port": 8080,
"startTime": 1618746646923,
"endTime": "N/A",
"status": "STOPPED",
"platform": "local",
"experimentName": "LN_MS",
"tag": [],
"pid": 34344,
"webuiUrl": [
"http://127.0.0.1:8080",
"http://10.2.4.60:8080",
"http://10.147.20.35:8080"
],
"logDir": "/home/jianjunming/projects/CRLNMS_clf_AutoML/configs/../checkpoints/nni-experiments"
},
"n6XH4aFU": {
"id": "n6XH4aFU",
"port": 8080,
"startTime": 1618813694427,
"endTime": "N/A",
"status": "INITIALIZED",
"platform": "local",
"experimentName": "LN_MS",
"tag": [],
"pid": 36252,
"webuiUrl": [],
"logDir": "/home/jianjunming/nni-experiments"
}
}
Then i set the logDir of n6XH4aFU
to the right value /home/jianjunming/projects/CRLNMS_clf_AutoML/configs/../checkpoints/nni-experiments
, and save it.
I tried nnictl view n6XH4aFU
again, and the issue arose again :(
After that i checked ~/nni-experiments/.experiment
, and found out that the logDir
became /home/jianjunming/nni-experiments
again :(
So weird
Here are the outputs of the terminal
INFO: view experiment n6XH4aFU...
INFO: Starting restful server...
ERROR: Restful server start failed!
INFO: Stdout:
-----------------------------------------------------------------------
Experiment start time 2021-04-18 21:32:09
-----------------------------------------------------------------------
-----------------------------------------------------------------------
Experiment start time 2021-04-19 09:50:47
-----------------------------------------------------------------------
-----------------------------------------------------------------------
Experiment start time 2021-04-19 14:18:58
-----------------------------------------------------------------------
-----------------------------------------------------------------------
Experiment start time 2021-04-19 14:20:57
-----------------------------------------------------------------------
-----------------------------------------------------------------------
Experiment start time 2021-04-19 14:28:14
-----------------------------------------------------------------------
INFO: Stderr:
-----------------------------------------------------------------------
Experiment start time 2021-04-18 21:32:09
-----------------------------------------------------------------------
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/json_tricks/nonp.py:225: JsonTricksDeprecation: `json_tricks.load(s)` stripped some comments, but `ignore_comments` was not passed; in the next major release, the behaviour when `ignore_comments` is not passed will change; it is recommended to explicitly pass `ignore_comments=True` if you want to strip comments; see https://github.com/mverleg/pyjson_tricks/issues/74
JsonTricksDeprecation)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:242: RuntimeWarning: divide by zero encountered in true_divide
return c - a / np.log(x)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/scipy/optimize/minpack.py:829: OptimizeWarning: Covariance of the parameters could not be estimated
category=OptimizeWarning)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:242: RuntimeWarning: divide by zero encountered in double_scalars
return c - a / np.log(x)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:171: RuntimeWarning: invalid value encountered in power
return c - (a*x+b)**-alpha
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:196: RuntimeWarning: invalid value encountered in power
return alpha - (alpha - beta) / (1. + (kappa * x)**delta)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:124: RuntimeWarning: invalid value encountered in double_scalars
return (theta * x**eta) / (kappa**eta + x**eta)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:267: RuntimeWarning: invalid value encountered in power
return alpha - (alpha - beta) * np.exp(-(kappa * x)**delta)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:291: RuntimeWarning: overflow encountered in exp
return a - (a - beta) * np.exp(-k*x**delta)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/model_factory.py:297: RuntimeWarning: invalid value encountered in true_divide
alpha = np.minimum(1, self.target_distribution(new_values) / self.target_distribution(self.weight_samples))
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:220: RuntimeWarning: overflow encountered in exp
return c - np.exp(-a*(x**alpha)+b)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:171: RuntimeWarning: invalid value encountered in double_scalars
return c - (a*x+b)**-alpha
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:147: RuntimeWarning: overflow encountered in true_divide
return a/(1.+(x/np.exp(b))**c)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:147: RuntimeWarning: overflow encountered in exp
return a/(1.+(x/np.exp(b))**c)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/curvefunctions.py:147: RuntimeWarning: divide by zero encountered in power
return a/(1.+(x/np.exp(b))**c)
/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni/algorithms/hpo/curvefitting_assessor/model_factory.py:297: RuntimeWarning: divide by zero encountered in true_divide
alpha = np.minimum(1, self.target_distribution(new_values) / self.target_distribution(self.weight_samples))
Error: Dispatcher stream error, tuner may have crashed.
at EventEmitter.dispatcher.onError (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/nnimanager.js:550:32)
at EventEmitter.emit (events.js:198:13)
at Socket.IpcInterface.outgoingStream.on (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/ipcInterface.js:42:72)
at Socket.emit (events.js:198:13)
at emitErrorNT (internal/streams/destroy.js:91:8)
at emitErrorAndCloseNT (internal/streams/destroy.js:59:3)
at process._tickCallback (internal/process/next_tick.js:63:19)
-----------------------------------------------------------------------
Experiment start time 2021-04-19 09:50:47
-----------------------------------------------------------------------
Failed to create log dir: AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
assert(fs.existsSync(dbDir))
at SqlDB.init (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/sqlDatabase.js:72:9)
at NNIDataStore.init (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/nniDataStore.js:35:21)
at initContainer (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/main.js:87:14)
at utils_1.mkDirP.then (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/main.js:146:15)
-----------------------------------------------------------------------
Experiment start time 2021-04-19 14:18:58
-----------------------------------------------------------------------
Failed to create log dir: AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
assert(fs.existsSync(dbDir))
at SqlDB.init (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/sqlDatabase.js:72:9)
at NNIDataStore.init (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/nniDataStore.js:35:21)
at initContainer (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/main.js:87:14)
at utils_1.mkDirP.then (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/main.js:146:15)
-----------------------------------------------------------------------
Experiment start time 2021-04-19 14:20:57
-----------------------------------------------------------------------
Failed to create log dir: AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
assert(fs.existsSync(dbDir))
at SqlDB.init (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/sqlDatabase.js:72:9)
at NNIDataStore.init (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/nniDataStore.js:35:21)
at initContainer (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/main.js:87:14)
at utils_1.mkDirP.then (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/main.js:146:15)
-----------------------------------------------------------------------
Experiment start time 2021-04-19 14:28:14
-----------------------------------------------------------------------
Failed to create log dir: AssertionError [ERR_ASSERTION]: The expression evaluated to a falsy value:
assert(fs.existsSync(dbDir))
at SqlDB.init (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/sqlDatabase.js:72:9)
at NNIDataStore.init (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/core/nniDataStore.js:35:21)
at initContainer (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/main.js:87:14)
at utils_1.mkDirP.then (/home/jianjunming/anaconda3/envs/pytorch/lib/python3.6/site-packages/nni_node/main.js:146:15)
Oh yes, it is a bug and will fix in nni v2.2, FYI #3545 . Now, you can add
experiment_config['logDir'] = experiments_dict[args.id]['logDir']
in ./site-packages/nni/tools/nnictl/launcher.py
L636 for workaround.
This works, thank you!!! Looking forward to V2.2
Environment:
Log message:
What issue meet, what's expected?: when i use command 'nnictl view n6XH4aFU' to view a stopped experiment, the issue arose.
How to reproduce it?: nnictl view n6XH4aFU
Additional information: