Inference Model is not reentrant.

castleguarders commented 5 years ago

Certain use cases need to use python threads in order to effectively process large amounts of I/O (Network and Disk) without stalling. Several instances of the inference model seemingly can fit into the (single) GPU memory as per tests. (GPU has 11GB memory). (After tweaking keras/tensorflow to not allocate 100% of free memory for each model).

However, when more than a single thread starts to use it's model instance, exceptions happen. See errors. My original approach of trying to pass a single model instance to the threads for inference also result in errors.

It's fairly clear that there is a design limitation here, I was hoping someone could shed light on how feasible it would be to resolve this. Since there is already parallel model support for the multiple GPU case, I was hoping it would be fairly quick to adapt to this multiple models on same GPU case..

The alternatives of using a single thread with a queue would introduce otherwise avoidable bottlenecking due to GIL..

Error Messages for the model per thread on a single GPU approach.

File "/home/castleguard/.local/lib/python3.6/site-packages/tensorflow/python/ops/control_flow_ops.py", line 3575, in _GroupControlDeps return no_op(name=name) File "/usr/lib/python3.6/contextlib.py", line 88, in exit next(self.gen) File "/home/castleguard/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4442, in device new_top_of_stack = self._device_function_stack.peek_objs()[0] IndexError: list index out of range

File "/home/castleguard/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 548, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.NotFoundError: PruneForTargets: Some target nodes not found: group_deps

Exception ignored in: <bound method BaseSession._Callable.del of <tensorflow.python.client.session.BaseSession._Callable object at 0x7f9268658be0>> Traceback (most recent call last): File "/home/castleguard/.local/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1456, in del self._session._session, self._handle, status) File "/home/castleguard/.local/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 548, in exit c_api.TF_GetCode(self.status.status)) tensorflow.python.framework.errors_impl.InvalidArgumentError: No such callable handle: 140264961352416

castleguarders commented 5 years ago

Errors when passing a model to a child thread.

File "./mrcnn/model.py", line 2532, in detect self.keras_model.predict([molded_images, image_metas, anchors], verbose=0) File "/home/castleguard/.local/lib/python3.6/site-packages/keras/engine/training.py", line 1164, in predict self._make_predict_function() File "/home/castleguard/.local/lib/python3.6/site-packages/keras/engine/training.py", line 554, in _make_predict_function kwargs) File "/home/castleguard/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2744, in function return Function(inputs, outputs, updates=updates, kwargs) File "/home/castleguard/.local/lib/python3.6/site-packages/keras/backend/tensorflow_backend.py", line 2546, in init with tf.control_dependencies(self.outputs): File "/home/castleguard/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 5290, in control_dependencies return get_default_graph().control_dependencies(control_inputs) File "/home/castleguard/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 4743, in control_dependencies c = self.as_graph_element(c) File "/home/castleguard/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3682, in as_graph_element return self._as_graph_element_locked(obj, allow_tensor, allow_operation) File "/home/castleguard/.local/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 3761, in _as_graph_element_locked raise ValueError("Tensor %s is not an element of this graph." % obj) ValueError: Tensor Tensor("mrcnn_detection/Reshape_1:0", shape=(1, 100, 6), dtype=float32) is not an element of this graph.

ygean commented 5 years ago

@castleguarders Have u found solution?

castleguarders commented 5 years ago

@castleguarders Have u found solution?

Yes, a workaround. I have a single instance of the model in a thread, wrapped it in a queue. All the other threads just send a message (frame) to this thread for inference. The results are passed back. I used one queue per thread that needs to use the detection. This also works when I use processes instead of threads.

I also found that you can run multiple instances of the model if you run completely separate processes. I am using this in combination with the above workaround to speed up inference. I didn't quite expect that multiple models would increase throughput, but it clearly does.

castleguarders commented 5 years ago

Also when running multiple instances, I had to tweak Keras code to not allocate all free memory on the GPU at each model instantiation. With this tweak I'm able to run 2 instances on a 1080ti with ~10.5GB available.

castleguarders commented 5 years ago

There is fairly heavy processing tied to the CPU before it's pipelined to the GPU. Image resizing, anchor point generation etc. After moving image resizing out to worker processes (tied to number of CPU cores), and 2 processes for GPU related inference, i'm able to get ~70% of a single GPU used. Clearly there is more to be gained by moving things a bit more, but i'm CPU limited at this time.

matterport / Mask_RCNN

Inference Model is not reentrant. #1406