Closed aaronsnoswell closed 7 years ago
Hi @aaronsnoswell! I've tested this code on ubuntu 16.04 and Mac OS X. Unfortunately we don't have windows machines in the lab. Please do let me know if you fix the issue, and I'll be infinitely thankful if you can submit a PR for it when that happens.
Thanks for the info Julieta. I was able to figure out what was going on. The full error message I was seeing is below;
(tensorflow35) E:\Aaron Snoswell PhD\Jul 2017 Having A Crack At It Again\human-motion-prediction>python src/translate.py --action walking --seq_length_out 25 --iterations 10000 --test_every 10 --save_every 10
Reading training data (seq_len_in: 50, seq_len_out 25).
Reading subject 1, action walking, subaction 1
Reading subject 1, action walking, subaction 2
Reading subject 6, action walking, subaction 1
Reading subject 6, action walking, subaction 2
Reading subject 7, action walking, subaction 1
Reading subject 7, action walking, subaction 2
Reading subject 8, action walking, subaction 1
Reading subject 8, action walking, subaction 2
Reading subject 9, action walking, subaction 1
Reading subject 9, action walking, subaction 2
Reading subject 11, action walking, subaction 1
Reading subject 11, action walking, subaction 2
Reading subject 5, action walking, subaction 1
Reading subject 5, action walking, subaction 2
done reading data.
2017-08-06 11:22:32.842004: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE instructions, but these are available on your machine and could speed up CPU computations.
2017-08-06 11:22:32.842120: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-06 11:22:32.842148: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE3 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-06 11:22:32.842169: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-06 11:22:32.842191: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-06 11:22:32.842212: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-08-06 11:22:32.842233: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-08-06 11:22:32.842255: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\platform\cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
Creating 1 layers of 1024 units.
One hot is True
Input size is 55
rnn_size = 1024
output_size = 55
state_size = 1024
Creating model with fresh parameters.
Model created
step 0000; step_loss: 0.9429
milliseconds | 80 | 160 | 320 | 400 | 560 | 1000 |
walking | 1.678 | 1.657 | 1.630 | 1.625 | 1.630 | 1.603 |
============================
Global step: 10
Learning rate: 0.0050
Step-time (ms): 1678.4439
Train loss avg: 0.9759
--------------------------
Val loss: 1.2118
srnn loss: 1.0169
============================
Saving the model...
2017-08-06 11:23:05.191146: W C:\tf_jenkins\home\workspace\nightly-win\M\windows\PY\35\tensorflow\core\framework\op_kernel.cc:1165] Not found: Failed to create a NewWriteableFile: experiments\walking\out_25\iterations_10000\tied\sampling_based\one_hot\depth_1\size_1024\lr_0.005\not_residual_vel\checkpoint-10.data-00000-of-00001.tempstate3138973305096497355 : The system cannot find the path specified.
Traceback (most recent call last):
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\client\session.py", line 1267, in _do_call
return fn(*args)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\client\session.py", line 1248, in _run_fn
status, run_metadata)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\contextlib.py", line 66, in __exit__
next(self.gen)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\framework\errors_impl.py", line 466, in raise_exception_on_not_ok_status
pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.NotFoundError: Failed to create a NewWriteableFile: experiments\walking\out_25\iterations_10000\tied\sampling_based\one_hot\depth_1\size_1024\lr_0.005\not_residual_vel\checkpoint-10.data-00000-of-00001.tempstate3138973305096497355 : The system cannot find the path specified.
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, Variable, Variable_1, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/candidate/bias, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/candidate/kernel, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/gates/bias, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/gates/kernel, proj_b_out, proj_w_out)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "src/translate.py", line 700, in <module>
tf.app.run()
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "src/translate.py", line 697, in main
train()
File "src/translate.py", line 479, in train
model.saver.save(sess, os.path.normpath(os.path.join(train_dir, 'checkpoint')), global_step=current_step )
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\training\saver.py", line 1490, in save
raise exc
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\training\saver.py", line 1474, in save
{self.saver_def.filename_tensor_name: checkpoint_file})
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\client\session.py", line 896, in run
run_metadata_ptr)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\client\session.py", line 1108, in _run
feed_dict_tensor, options, run_metadata)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\client\session.py", line 1261, in _do_run
options, run_metadata)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\client\session.py", line 1280, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.NotFoundError: Failed to create a NewWriteableFile: experiments\walking\out_25\iterations_10000\tied\sampling_based\one_hot\depth_1\size_1024\lr_0.005\not_residual_vel\checkpoint-10.data-00000-of-00001.tempstate3138973305096497355 : The system cannot find the path specified.
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, Variable, Variable_1, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/candidate/bias, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/candidate/kernel, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/gates/bias, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/gates/kernel, proj_b_out, proj_w_out)]]
Caused by op 'save/SaveV2', defined at:
File "src/translate.py", line 700, in <module>
tf.app.run()
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\platform\app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "src/translate.py", line 697, in main
train()
File "src/translate.py", line 132, in train
model = create_model( sess, actions )
File "src/translate.py", line 83, in create_model
dtype=tf.float32)
File "E:\Aaron Snoswell PhD\Jul 2017 Having A Crack At It Again\human-motion-prediction\src\seq2seq_model.py", line 381, in __init__
self.saver = tf.train.Saver( tf.global_variables(), max_to_keep=10 )
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\training\saver.py", line 1140, in __init__
self.build()
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\training\saver.py", line 1172, in build
filename=self._filename)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\training\saver.py", line 686, in build
save_tensor = self._AddSaveOps(filename_tensor, saveables)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\training\saver.py", line 276, in _AddSaveOps
save = self.save_op(filename_tensor, saveables)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\training\saver.py", line 219, in save_op
tensors)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\ops\gen_io_ops.py", line 766, in save_v2
tensors=tensors, name=name)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 767, in apply_op
op_def=op_def)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\framework\ops.py", line 2528, in create_op
original_op=self._default_original_op, op_def=op_def)
File "C:\Users\uqasnosw\AppData\Local\Continuum\Miniconda3\envs\tensorflow35\lib\site-packages\tensorflow\python\framework\ops.py", line 1203, in __init__
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
NotFoundError (see above for traceback): Failed to create a NewWriteableFile: experiments\walking\out_25\iterations_10000\tied\sampling_based\one_hot\depth_1\size_1024\lr_0.005\not_residual_vel\checkpoint-10.data-00000-of-00001.tempstate3138973305096497355 : The system cannot find the path specified.
[[Node: save/SaveV2 = SaveV2[dtypes=[DT_FLOAT, DT_INT32, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _device="/job:localhost/replica:0/task:0/cpu:0"](_arg_save/Const_0_0, save/SaveV2/tensor_names, save/SaveV2/shape_and_slices, Variable, Variable_1, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/candidate/bias, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/candidate/kernel, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/gates/bias, combined_tied_rnn_seq2seq/tied_rnn_seq2seq/gru_cell/gates/kernel, proj_b_out, proj_w_out)]]
As you can see, TensorFlow's Saver.save method was failing to create the checkpoint files. The problem turned out to be the maximum path length supported by the Windows shell (255 UTF-16 code words), not an NTFS problem. The code for me is generating a path something like E:\Aaron Snoswell PhD\Jul 2017 Having A Crack At It Again\human-motion-prediction\experiments\walking\out_25\iterations_10000\tied\sampling_based\one_hot\depth_1\size_1024\lr_0.005\not_residual_vel\checkpoint-10.data-00000-of-00001.tempstate3138973305096497355
, or around 260 characters. By moving the code to a very shallow project folder (e.g. E:\hmp
), this problem goes away. After applying the changes in my other pull requests, this code runs fine on Windows so far.
There are workarounds in C/C++ to get longer path names on Windows - this is an issue with the TensorFlow core library that I'll raise.
Just FYI. I faced exactly the same issue. As mentioned above, I was able to solve the issue by running the code through a shallow folder. Code at: https://gist.github.com/imnishantg/5067dd7c1572e0891595bf05c3d2caf0
System Info: Windows 10, 64-bit TensorFlow version: 1.2.1
Just want to check if this issue is being resolved in the later version of the TF...
Thanks Nishant
i just downloaded TF and have the same issue
Hi there,
Quick post below that I'll update later when I have some more time;
Thanks for this paper and for sharing your code. I'm trying to replicate your results on Windows 10, and the TensorFlow Saver class that saves the model as it is training seems to have an issue. Either the path name or the file name of the files is far too long for Windows or NTFS (I haven't determined which yet). To help me debug this, can you let me know what operating system and file system you were running this code on?
Thank you, I'll share more info later.