Closed anandhperumal closed 5 years ago
Ok, i found a problem
I change run.ps1
from:
cd D:\Projekty\ML_5DWChallenge\vel5\day2
$env:NNI_PLATFORM="local"
$env:NNI_EXP_ID="f6l0s17p"
$env:NNI_SYS_DIR="C:\Users\gdworak\nni\experiments\f6l0s17p\trials\o5Kpp"
$env:NNI_TRIAL_JOB_ID="o5Kpp"
$env:NNI_OUTPUT_DIR="C:\Users\gdworak\nni\experiments\f6l0s17p\trials\o5Kpp"
$env:NNI_TRIAL_SEQ_ID="3"
$env:MULTI_PHASE="false"
$env:CUDA_VISIBLE_DEVICES="-1"
cmd /c python D:\Projekty\ML_5DWChallenge\vel5\day2\nni_day2.py 2>C:\Users\gdworak\nni\experiments\f6l0s17p\trials\o5Kpp\stderr
$NOW_DATE = [int64](([datetime]::UtcNow)-(get-date "1/1/1970")).TotalSeconds
$NOW_DATE = "$NOW_DATE" + (Get-Date -Format fff).ToString()
Write $LASTEXITCODE " " $NOW_DATE | Out-File C:\Users\gdworak\nni\experiments\f6l0s17p\trials\o5Kpp\.nni\state -NoNewline -encoding utf8
I change run.ps1
to:
cd D:\Projekty\ML_5DWChallenge\vel5\day2
$env:NNI_PLATFORM="local"
$env:NNI_EXP_ID="f6l0s17p"
$env:NNI_SYS_DIR="C:\Users\gdworak\nni\experiments\f6l0s17p\trials\o5Kpp"
$env:NNI_TRIAL_JOB_ID="o5Kpp"
$env:NNI_OUTPUT_DIR="C:\Users\gdworak\nni\experiments\f6l0s17p\trials\o5Kpp"
$env:NNI_TRIAL_SEQ_ID="3"
$env:MULTI_PHASE="false"
$env:CUDA_VISIBLE_DEVICES="-1"
cmd.exe /c python D:\Projekty\ML_5DWChallenge\vel5\day2\nni_day2.py 2>C:\Users\gdworak\nni\experiments\f6l0s17p\trials\o5Kpp\stderr
$NOW_DATE = [int64](([datetime]::UtcNow)-(get-date "1/1/1970")).TotalSeconds
$NOW_DATE = "$NOW_DATE" + (Get-Date -Format fff).ToString()
Write $LASTEXITCODE " " $NOW_DATE | Out-File C:\Users\gdworak\nni\experiments\f6l0s17p\trials\o5Kpp\.nni\state -NoNewline -encoding utf8
I don't know why my powershell don't have command cmd
. I need use cmd.exe
Does anyone have an idea how to fix it?
Have you ever installed MSYS2
? It might be causing a conflict on cmd.
No, I never installed MSYS2
@Grzechu11 It's also possible that there is a file named cmd
somewhere under your system path. Please check your environment variables or do a global search for cmd
.
I check a command path, Anaconda has cmd file
I changed the name of the cmd file And now everything is correct
Thanks for help
Hello all, I had also met the same problem...
@xinshouke This problem will be fixed in future releases. For now, please change the name of your cmd
file or use platforms other than Windows.
@ultmaster I had no Anaconda,I just install tensorflow thr pip... Since the problem was fixed, I hope upgrade the nni to resolve the problem, may not I? May I execute the below command as 'python -m pip install --upgrade nni' for this problem?
@xinshouke Not before NNI 1.1 is released. For now, you can install NNI from source code follow instructions in README "Install through source code" for testing.
Hello all, I also get the same problem in Linux System. Although I check my log, I cannot find obviously error. Do you have any idea? Please give me some advice
@chaos0625 Please elaborate. Including nnimanager.log, dispatcher.log, trial logs, stderrs and system configurations.
@ultmaster Thanks your reply! system configurations: python: 3.6 tensorflow: 1.14.0 nni: 1.1 system: Ubuntu
nnimanager.log: [10/25/2019, 3:47:00 PM] INFO [ 'Datastore initialization done' ] [10/25/2019, 3:47:00 PM] INFO [ 'Rest server listening on: http://0.0.0.0:8080' ] [10/25/2019, 3:47:00 PM] INFO [ 'RestServer start' ] [10/25/2019, 3:47:00 PM] INFO [ 'Construct local machine training service.' ] [10/25/2019, 3:47:00 PM] INFO [ 'RestServer base port is 8080' ] [10/25/2019, 3:47:02 PM] INFO [ 'NNIManager setClusterMetadata, key: trial_config, value: {"command":"python3 mnist.py","codeDir":"/home/gaochao/program/nni-master/examples/trials/mnist/.","gpuNum":0}' ] [10/25/2019, 3:47:02 PM] INFO [ 'required GPU number is 0' ] [10/25/2019, 3:47:02 PM] INFO [ 'Starting experiment: wnc4Z2cq' ] [10/25/2019, 3:47:02 PM] INFO [ 'Change NNIManager status from: INITIALIZED to: RUNNING' ] [10/25/2019, 3:47:02 PM] INFO [ 'Add event listeners' ] [10/25/2019, 3:47:02 PM] INFO [ 'Run local machine training service.' ] [10/25/2019, 3:47:03 PM] INFO [ 'NNIManager received command from dispatcher: ID, ' ] [10/25/2019, 3:47:03 PM] INFO [ 'NNIManager received command from dispatcher: TR, {"parameter_id": 0, "parameter_source": "algorithm", "parameters": {"dropout_rate": 0.8576007705118804, "conv_size": 2, "hidden_size": 512, "batch_size": 8, "learning_rate": 0.1}, "parameter_index": 0}' ] [10/25/2019, 3:47:07 PM] INFO [ 'submitTrialJob: form: {"sequenceId":0,"hyperParameters":{"value":"{\"parameter_id\": 0, \"parameter_source\": \"algorithm\", \"parameters\": {\"dropout_rate\": 0.8576007705118804, \"conv_size\": 2, \"hidden_size\": 512, \"batch_size\": 8, \"learning_rate\": 0.1}, \"parameter_index\": 0}","index":0}}' ] [10/25/2019, 3:47:17 PM] INFO [ 'Trial job OYhUj status changed from WAITING to RUNNING' ] [10/25/2019, 3:49:28 PM] INFO [ 'Trial job OYhUj status changed from RUNNING to FAILED' ] [10/25/2019, 3:49:28 PM] INFO [ 'NNIManager received command from dispatcher: TR, {"parameter_id": 1, "parameter_source": "algorithm", "parameters": {"dropout_rate": 0.5817909748038591, "conv_size": 5, "hidden_size": 512, "batch_size": 4, "learning_rate": 0.0001}, "parameter_index": 0}' ] [10/25/2019, 3:49:33 PM] INFO [ 'submitTrialJob: form: {"sequenceId":1,"hyperParameters":{"value":"{\"parameter_id\": 1, \"parameter_source\": \"algorithm\", \"parameters\": {\"dropout_rate\": 0.5817909748038591, \"conv_size\": 5, \"hidden_size\": 512, \"batch_size\": 4, \"learning_rate\": 0.0001}, \"parameter_index\": 0}","index":0}}' ] [10/25/2019, 3:49:38 PM] INFO [ 'Trial job lsGqg status changed from WAITING to RUNNING' ]
dispatcher.log: [10/25/2019, 03:47:03 PM] INFO (nni.msg_dispatcher_base/MainThread) Start dispatcher [10/25/2019, 03:47:03 PM] INFO (hyperopt.tpe/Thread-1) tpe_transform took 0.002083 seconds [10/25/2019, 03:47:03 PM] INFO (hyperopt.tpe/Thread-1) TPE using 0 trials [10/25/2019, 03:49:28 PM] INFO (hyperopt.tpe/Thread-1) tpe_transform took 0.004800 seconds [10/25/2019, 03:49:28 PM] INFO (hyperopt.tpe/Thread-1) TPE using 0 trials [10/25/2019, 03:51:53 PM] INFO (hyperopt.tpe/Thread-1) tpe_transform took 0.002802 seconds [10/25/2019, 03:51:53 PM] INFO (hyperopt.tpe/Thread-1) TPE using 0 trials
trial logs:
[10/25/2019, 03:52:06 PM] WARNING (tensorflow/MainThread) From mnist.py:151: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
[10/25/2019, 03:52:06 PM] WARNING (tensorflow/MainThread) From /home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
[10/25/2019, 03:52:06 PM] WARNING (tensorflow/MainThread) From /home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: _internal_retry.
stderr:
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:516: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:517: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:518: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:519: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:520: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/python/framework/dtypes.py:525: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:541: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint8 = np.dtype([("qint8", np.int8, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:542: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint8 = np.dtype([("quint8", np.uint8, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:543: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint16 = np.dtype([("qint16", np.int16, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:544: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_quint16 = np.dtype([("quint16", np.uint16, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:545: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
_np_qint32 = np.dtype([("qint32", np.int32, 1)])
/home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorboard/compat/tensorflow_stub/dtypes.py:550: FutureWarning: Passing (type, 1) or '1type' as a synonym of type is deprecated; in a future version of numpy, it will be understood as (type, (1,)) / '(1,)type'.
np_resource = np.dtype([("resource", np.ubyte, 1)])
WARNING:tensorflow:From mnist.py:151: read_data_sets (from tensorflow.contrib.learn.python.learn.datasets.mnist) is deprecated and will be removed in a future version.
Instructions for updating:
Please use alternatives such as official/mnist/dataset.py from tensorflow/models.
WARNING:tensorflow:From /home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/mnist.py:260: maybe_download (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.
Instructions for updating:
Please write your own downloading logic.
WARNING:tensorflow:From /home/gaochao/anaconda3/envs/gaochao/lib/python3.6/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:252: _internal_retry.
I got above log just now, and the trial is failed. Sometimes, there are network error in log, for example: oserror errno 101 network is unreachable
@chaos0625. I'm pretty sure this is a new issue. Please open a new one if you need additional help.
Meanwhile, please check whether you can run mnist.py
alone successfully. And please downgrade your tensorflow to 1.12.0 for another try.
@ultmaster I have opened a new issue for help. I run mnist.py alone, but I got an error:
Traceback (most recent call last):
File "mnist.py", line 234, in
I'll downgrade my tensorflow and try to run "nnictl create --config nni-master/examples/trials/mnist/config.yml" again
nni Environment:
I'm just starting with NNI, I'm following the document step by step. And I see all my trails are getting failed
The exact command I ran and it's output:
I have attached the image of the WebUI which shows the trails has failed
Moreover, I keep getting a pop-up :
how to disable it?
And where can I see logs? why are my trails failing? Please let me know if I'm missing anything. Any leads will be appreciated.
Thanks