Open JohnBakery opened 5 years ago
Maybe it's an issue with the version of the packages. Can you do pip freeze
?
bleach==1.5.0 colorama==0.4.1 cycler==0.10.0 decorator==4.3.2 html5lib==0.9999999 ipython==6.0.0 ipython-genutils==0.2.0 jedi==0.13.2 Markdown==3.0.1 matplotlib==2.0.0 numpy==1.12.1 olefile==0.46 parso==0.3.4 pickleshare==0.7.5 Pillow==4.1.0 prompt-toolkit==1.0.15 protobuf==3.6.1 Pygments==2.3.1 pyparsing==2.3.1 python-dateutil==2.8.0 pytz==2018.9 simplegeneric==0.8.1 six==1.12.0 tensorflow==1.3.0 tensorflow-tensorboard==0.1.5 traitlets==4.3.2 wcwidth==0.1.7 Werkzeug==0.14.1
Thanks. I don't see anything wrong. Is it consistently crashing after 17000 training steps?
Yes, exactly at 17000, every single time. If I restart without deleting the tfrecords files, it will crash right away. If I delete them, it will run until 17000. Looking at the files voc-17000.tfrecords
is 65MB, while all others are ~105MB. Not sure if that matters.
This one is smaller because it is the last one (it contains less images). You can try removing it.
Deleted the voc-17000, restarted learning and it crashes right away with the same message
It looks like a Windows specific issue. https://github.com/tensorflow/tensorflow/issues/9672 Try using tensorflow==1.4.0 instead
I updated to 1.4.0 and it solved the crashing issue. However, it gets stuck after saying Shuffle buffer filled.
WARNING:tensorflow:From G:\Users\user\Desktop\cnn\dataset.py:110: TFRecordDataset.__init__ (from tensorflow.contrib.data.python.ops.readers) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.data.TFRecordDataset`.
2019-02-19 16:30:40.833944: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 159 of 10000
2019-02-19 16:30:50.826511: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 320 of 10000
2019-02-19 16:30:57.231075: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:121] Shuffle buffer filled.
I thought this could've been due to tenserflow-tensotboard being incompatible, since when I switched to 1.4.0 I got the following message
tensorflow 1.4.0 has requirement tensorflow-tensorboard<0.5.0,>=0.4.0rc1, but you'll have tensorflow-tensorboard 0.1.5 which is incompatible.
so I updated to 0.4.0rc1, but it still hangs at Shuffle buffer filled
.
So I let it run and apparently it is doing something
2019-02-19 16:51:05.815504: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 147 of 10000
2019-02-19 16:51:15.779815: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 248 of 10000
2019-02-19 16:51:25.787975: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 379 of 10000
2019-02-19 16:51:29.268296: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:121] Shuffle buffer filled.
2019-02-19 18:37:48.078317: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 153 of 10000
2019-02-19 18:37:58.045792: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 307 of 10000
2019-02-19 18:38:05.494099: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:121] Shuffle buffer filled.
2019-02-19 20:24:20.950043: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 157 of 10000
2019-02-19 20:24:30.912401: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 317 of 10000
2019-02-19 20:24:38.561894: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:121] Shuffle buffer filled.
2019-02-19 22:10:38.268626: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 155 of 10000
2019-02-19 22:10:48.274232: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:110] Filling up shuffle buffer (this may take a while): 314 of 10000
2019-02-19 22:10:55.083987: I C:\tf_jenkins\home\workspace\rel-win\M\windows\PY\36\tensorflow\core\kernels\shuffle_dataset_op.cc:121] Shuffle buffer filled.
For how long should I let it run?
I want to chime in, too. I also got to "Shuffle buffer filled" and the last one I got is about an hour ago. Did your finish somehow at the end?
When I run
G:\Users\user\Desktop\cnn>C:\Users\user\AppData\Local\Programs\Python\Python36\python.exe watermarks.py --logdir=save/
The trainer crashes after exactly 17000 TFRecords with the following message
Am I doing something wrong?