Closed austinmw closed 6 years ago
Thank you for your post. We noticed you have not filled out the following field in the issue template. Could you update them if they are relevant in your case, or leave them as N/A? Thanks. What is the top-level directory of the model you are using Bazel version
Hi @austinmw , setting the batch size from '1' to '1*num_clones' in training configs helps me solve this problem. I believe it's due to line 265 in object_detection/trainer.py
batch_size = train_config.batch_size // num_clones
@ImEric Hi, I can run other models fine, but this is a bug when I run faster_rcnn_nas_coco, can you help me?
The error message: File "legacy/train.py", line 184, in tf.app.run() File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/platform/app.py", line 126, in run _sys.exit(main(argv)) File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 250, in new_func return func(*args, **kwargs) File "legacy/train.py", line 93, in main FLAGS.pipeline_config_path) File "/home/ubuntu/.local/lib/python3.5/site-packages/tensorflow/models-master/research/object_detection/utils/config_util.py", line 94, in get_configs_from_pipeline_file text_format.Merge(proto_str, pipeline_config) File "/home/ubuntu/.local/lib/python3.5/site-packages/google/protobuf/text_format.py", line 533, in Merge descriptor_pool=descriptor_pool) File "/home/ubuntu/.local/lib/python3.5/site-packages/google/protobuf/text_format.py", line 587, in MergeLines return parser.MergeLines(lines, message) File "/home/ubuntu/.local/lib/python3.5/site-packages/google/protobuf/text_format.py", line 620, in MergeLines self._ParseOrMerge(lines, message) File "/home/ubuntu/.local/lib/python3.5/site-packages/google/protobuf/text_format.py", line 635, in _ParseOrMerge self._MergeField(tokenizer, message) File "/home/ubuntu/.local/lib/python3.5/site-packages/google/protobuf/text_format.py", line 735, in _MergeField merger(tokenizer, message, field) File "/home/ubuntu/.local/lib/python3.5/site-packages/google/protobuf/text_format.py", line 823, in _MergeMessageField self._MergeField(tokenizer, sub_message) File "/home/ubuntu/.local/lib/python3.5/site-packages/google/protobuf/text_format.py", line 703, in _MergeField (message_descriptor.full_name, name)) google.protobuf.text_format.ParseError: 143:1 : Message type "object_detection.protos.EvalConfig" has no field named "eval_input_reader".
my protoc --version : 3.5.1
my tensorflow : 1.6.0
my GUP : -----------------------------------------------------------------------------+ | NVIDIA-SMI 384.130 Driver Version: 384.130 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 00000000:02:00.0 On | N/A | | 51% 84C P2 156W / 250W | 10539MiB / 11170MiB | 95% Default | +-------------------------------+----------------------+----------------------+ | 1 GeForce GTX 108... Off | 00000000:03:00.0 Off | N/A | | 46% 68C P8 22W / 250W | 10786MiB / 11172MiB | 0% Default | +-------------------------------+----------------------+----------------------+ | 2 GeForce GTX 108... Off | 00000000:82:00.0 Off | N/A | | 62% 85C P2 102W / 250W | 10788MiB / 11172MiB | 91% Default | +-------------------------------+----------------------+----------------------+ | 3 GeForce GTX 108... Off | 00000000:83:00.0 Off | N/A | | 30% 49C P8 18W / 250W | 10592MiB / 11172MiB | 0% Default | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 1574 G /usr/lib/xorg/Xorg 84MiB | | 0 28929 C python 10343MiB | | 0 37191 G compiz 95MiB | | 0 46611 G /opt/teamviewer/tv_bin/TeamViewer 11MiB | | 1 18207 C python3 10775MiB | | 2 20236 C python3 10775MiB | | 3 16837 C python 10579MiB | +----------------------------------------------------------------------------
Not sure if this is a bug or not since I can run other models fine.
System information
Describe the problem
I've previously tried two models:
ssd_mobilenet_v1_coco_2017_11_17
andfaster_rcnn_inception_resnet_v2_atrous_coco_2018_01_28
which will both run like this:I have 4 GPU's so I've been setting the first three to train and the last one to eval. However, for some reason I'm unable to do the same for the model
faster_rcnn_nas_coco_2018_01_28
. When I try to set--num_clones=3
I get the error:Could anyone please explain why this is or how I can fix it so that I can run this model with more than 1 GPU?