Closed arptejan95 closed 6 years ago
I'm facing a similar error as well.
Adding the code owner for more input. @pkulzc , feel free to add more people to the thread.
Currently training on cloud with tf 1.4+ are not working, as mentioned here: https://github.com/tensorflow/models/issues/3071 https://github.com/tensorflow/models/issues/3788
This is a known issue and we're investigating. We're also doing some consolidation so this issue will go away anyway when the consolidation is done. If you really need to train on cloud, you can use a earlier version of my repo.
@pkulzc would this work with ssd_mobilenet_v2_coco?
Hii I am getting error when I run the command- python train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/faster_rcnn_inception_v2_pets.config
in build functools.partial(tf.data.TFRecordDataset, buffer_size=8 1000 1000), AttributeError: 'module' object has no attribute 'data'
Then I replaced all tf.data with tf.contrib.data but then we faced an error as below :
in read_dataset records_dataset = filename_dataset.apply( AttributeError: 'RepeatDataset' object has no attribute 'apply'
Any help would be appreciated! Thanks!
@KingsonSingh Sorry for the late response. Your issue is different from this one, please open a separate issue if the problem still happens and provide more details(following the instructions here)
@KingsonSingh I'm getting a similar error as well. Have you solved?
@xinyuabcd
Yes ! You can solve this error by updating tensorflow package.
@KingsonSingh
OK!
Thanks!
@KingsonSingh In addition to upgrade tf version to1.4,Is there any other way ? for example, adapting tf1.2?thank you
@pkulzc Any update on support of TensorFlow 1.4+ on CloudML?
How come training on cloud with tf +1.4 is not working BUT Google cloud here supports different environments including Tensorflow 1.8 and Python 3.5
Any update concerning this problem?
We have been looking to run the steps mentioned over here the only difference being we used ssd_mobilenet_v2_coco.
While using
runtime-version 1.6
the training just stops after some steps with the below error:The replica master 0 exited with a non-zero status of 1. Termination reason: Error. Traceback (most recent call last): File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 167, in <module> tf.app.run() File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/root/.local/lib/python2.7/site-packages/object_detection/train.py", line 163, in main worker_job_name, is_chief, FLAGS.train_dir) File "/root/.local/lib/python2.7/site-packages/object_detection/trainer.py", line 370, in train saver=saver) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 768, in train sess, train_op, global_step, train_step_kwargs) File "/usr/local/lib/python2.7/dist-packages/tensorflow/contrib/slim/python/slim/learning.py", line 487, in train_step run_metadata=run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 905, in run run_metadata_ptr) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1137, in _run feed_dict_tensor, options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1355, in _do_run options, run_metadata) File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1374, in _do_call raise type(e)(node_def, op, message) UnavailableError: OS Error
We also tried to use
runtime-version 1.2
but we faced the below error:in build functools.partial(tf.data.TFRecordDataset, buffer_size=8 * 1000 * 1000), AttributeError: 'module' object has no attribute 'data'
Then we replaced all
tf.data
withtf.contrib.data
but then we faced an error as below :in read_dataset records_dataset = filename_dataset.apply( AttributeError: 'RepeatDataset' object has no attribute 'apply'
Any help would be appreciated! Thanks!