naisy / realtime_object_detection

Plug and Play Real-Time Object Detection App with Tensorflow and OpenCV. No Bugs No Worries. Enjoy!
MIT License
101 stars 36 forks source link

split_model: True not working on any of the pretrained Faster RCNN models #55

Closed brb1901 closed 5 years ago

brb1901 commented 5 years ago

Hello @naisy, I am testing this code on a Ubuntu 16.04 desktop PC with CUDA 9, TF 1.8 (compiled from source), and a TITAN X (Pascal) GPU. I am able to run successfully both the pretrained SSD models and Faster RCNN models (from the TF object detection model zoo), and I'm getting high speeds (high fps) -- nice work!

However, for the pretrained Faster RCNN models that you mention, setting split_model: True in the config.yml does not work for me. This issue may be somewhat related to #53 but I think it is different: I'm talking about pretrained models that you mention yourself. E.g. for faster_rcnn_resnet101_coco_2018_01_28 (http://download.tensorflow.org/models/object_detection/faster_rcnn_resnet101_coco_2018_01_28.tar.gz) it gives the error message below. It would appear that your function load_frozen_graph_with_split in load_graph_faster_v2.py expects another graph than this type of faster_rcnn model has? Or is this is a TF version issue? Have you actually tried it yourself (recently) with any of the Faster RCNN models on a PC and this option split_model: True? Or do you have a suggestion as to how and where to do the splitting for this particular model? It seems it should be possible, as also indicated by yourself here: #https://github.com/tensorflow/models/issues/3270 . And I would prefer to use Faster RCNN, as it gives better detection results for me, while not being (much) slower than SSD with this setup.

(BTW this faster_rcnn_resnet101_coco_2018_01_28 model still runs reasonably fast even with split_model: False. Around 12-13 fps on a video file with 3840x2160 resolution, including visualization, on the machine I describe above.)

2018-10-12 14:26:28.621145: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 11372 MB memory) -> physical GPU (device: 0, name: TITAN X (Pascal), pci bus id: 0000:02:00.0, compute capability: 6.1)
Traceback (most recent call last):
  File "/home/bram/cygrepo/coderepo/trunk/projectcode/realtime_object_detection_naisy_cyg/trunk/realtime_object_detection/lib/session_worker.py", line 75, in execution
    results = sess.run(opts, feed_dict=feeds)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 900, in run
    run_metadata_ptr)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1135, in _run
    feed_dict_tensor, options, run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1316, in _do_run
    run_metadata)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1335, in _do_call
    raise type(e)(node_def, op, message)
InvalidArgumentError: You must feed a value for placeholder tensor 'SecondStagePostprocessor/ToFloat_2' with dtype float
     [[Node: SecondStagePostprocessor/ToFloat_2 = Placeholder[dtype=DT_FLOAT, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]

Caused by op u'SecondStagePostprocessor/ToFloat_2', defined at:
  File "run_video.py", line 147, in <module>
    main()
  File "run_video.py", line 135, in main
    detection.start(cfg)
  File "/home/bram/cygrepo/coderepo/trunk/projectcode/realtime_object_detection_naisy_cyg/trunk/realtime_object_detection/lib/detection_faster_v2.py", line 69, in start
    graph = load_frozen_graph.load_graph()
  File "/home/bram/cygrepo/coderepo/trunk/projectcode/realtime_object_detection_naisy_cyg/trunk/realtime_object_detection/lib/load_graph_faster_v2.py", line 18, in load_graph
    return self.load_frozen_graph_with_split()
  File "/home/bram/cygrepo/coderepo/trunk/projectcode/realtime_object_detection_naisy_cyg/trunk/realtime_object_detection/lib/load_graph_faster_v2.py", line 227, in load_frozen_graph_with_split
    tf.import_graph_def(remove, name='')
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/util/deprecation.py", line 432, in new_func
    return func(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 513, in import_graph_def
    _ProcessNewOps(graph)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/importer.py", line 303, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3540, in _add_new_tf_operations
    for c_op in c_api_util.new_tf_operations(self)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 3428, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1718, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): You must feed a value for placeholder tensor 'SecondStagePostprocessor/ToFloat_2' with dtype float
     [[Node: SecondStagePostprocessor/ToFloat_2 = Placeholder[dtype=DT_FLOAT, shape=<unknown>, _device="/job:localhost/replica:0/task:0/device:CPU:0"]()]]
naisy commented 5 years ago

Hi @brb1901,

Thank you for the detailed report. When I tried with Tensorflow r1.6.1 and r1.10.1, r1.6.1 worked but the same error was encountered in r1.10.1. I will check this error.

naisy commented 5 years ago

Hi @brb1901,

I changed split node, and it works now.

Problem node: ToFloat faster_rcnn_split_node_0_before

Change to: stack_1 faster_rcnn_split_node_0_after

Thank you.

naisy commented 5 years ago

Hi @brb1901,

I checked splitted graph. It seems to be a problem that occurred because of the change in the rule of node name.

tf1.6 ToFloat -> ToFloat_1 ToFloat_1 -> ToFloat_1_1 tf1.10.1 ToFloat -> ToFloat_2 ToFloat_1 -> ToFloat_1

tf1.6.1 ToFloat -> ToFloat_1 tf1 6 1_tofloat tf1.6.1 ToFloat_1 -> ToFloat_1_1 tf1 6 1_tofloat_1

tf1.10.1 ToFloat -> ToFloat_2 tf1 10 1_tofloat tf1.10.1 ToFloat_1 -> ToFloat_1 tf1 10 1_tofloat_1

In the past, it was customary to simply add '_1' to the existing node name, but now it seems that protection of the existing node name has priority. So It seems better to choose the node name where '_1' does not exist for the split node.

brb1901 commented 5 years ago

Hi @naisy, I tried it and it is indeed working for me too now; thank you very much! So it did in fact have to do with changing tensorflow versions. I think that leads to problems too often; one would hope that they (at Google) could keep it a bit more stable and robust...

Anyway, not only is it working, but as hoped it leads to a speed improvement too! Tested it again with the faster_rcnn_resnet101_coco_2018_01_28 model. Now on a video file where split_model: False leads to around to 10.8 fps (including visualization), split_model: True leads to around to 13 fps. And with visualize: False and force_gpu_compatible: True, I get around 13.9 fps. On a webcam (i.e. without all the file reading), I now get around 16.7 with split_model: True (13.6 fps with split_model: False). (BTW visualization on or off doesn't make that much difference for me, because next to the Titan X I have a separate GPU, a GTX 1050, that is responsible for the screen).