Multi-GPU can't set when use model fasterRcnn_inception_resnet_v2

tensorflow / models

Models and examples built with TensorFlow

Other

77.15k stars 45.75k forks source link

Multi-GPU can't set when use model fasterRcnn_inception_resnet_v2 #2407

Closed chenyuZha closed 7 years ago

chenyuZha commented 7 years ago

When I use model fasterRcnn_inception_resnet_v2 with my own data for training, I set --num_clones=2 to use my 2 GPUs. But I got the error below: File "/home/zha/Documents/models-master/object_detection/trainer.py", line 117, in _create_losses ) = _get_inputs(input_queue, detection_model.num_classes) ValueError: need more than 0 values to unpack I tested with model ssd then everything is fine. The version of python is 2.7 and the system is ubuntu 16.04. Could anyone can tell me why I got this error?(Search in stack overflow but no response). Thanks a lot!

poxvoculi commented 7 years ago

This question is better asked on StackOverflow since it is not a bug or feature request. There is also a larger community that reads questions there. Thanks!

darraghdog commented 7 years ago

@chenyuZha I had similar issue. Reason was that I tried to run multi-gpu while my batchsize in the config file was still set at 1. If you run --num_clones 2, your batch size in the config file must be at least 2 or in incrments of 2.

davidblumntcgeo commented 6 years ago

darraghdog is correct regarding batch size.

Also, if you are still having issues, try adding --ps_tasks=1 to your list of arguments for train.py (putting it right after the num_clones argument should work). This works for me when I run ssd_inception_v2_coco using TF runtime 1.6 with python 2.7 on ubuntu 16.04. I haven't tried the particular model you are using.