openvinotoolkit / openvino_tensorflow

OpenVINO™ integration with TensorFlow
Other
178 stars 54 forks source link

[SSD Models] Segmentation Fault #201

Closed elimkwan closed 2 years ago

elimkwan commented 3 years ago

We observed that the program fails whenever the same image has been passed to sess.run() more than once. This error only happens with the SSD object detection models. For example:

Error occurs here:

OVTF Summary -> 2093 out of 4245 nodes in the graph (49%) are now running with OpenVINO™ backend
Predict Image [10]
Predict Image [89]
Predict Image [10] --> Segmentation fault

Error doesn't occur if we reinitialise the session:

OVTF Summary -> 2093 out of 4245 nodes in the graph (49%) are now running with OpenVINO™ backend
Predict Image [10]
Predict Image [89]
OVTF Summary -> 2093 out of 4245 nodes in the graph (49%) are now running with OpenVINO™ backend
Predict Image [10] 
...

Here is a code snippet for where the error was at:

Model: ssd_inception_v2,ssd_mobilenet_v1,ssd_mobilenet_v1_fpn,ssd_mobilenet_v2,ssd_resnet_50_fpn, ssdlite_mobilenet_v2 TensorFlow version: 2.6.0 openvino-tensorflow version: 1.0.0

class BackendTensorflowOpenvino(...)
  ...
  def load(self, model_path, inputs=None, outputs=None):
      # there is no input/output meta data i the graph so it need to come from config.
      if not inputs:
          raise ValueError("BackendTensorflow needs inputs")
      if not outputs:
          raise ValueError("BackendTensorflow needs outputs")
      self.outputs = outputs
      self.inputs = inputs

      infer_config = tf.compat.v1.ConfigProto()
      infer_config.intra_op_parallelism_threads = int(os.environ['TF_INTRA_OP_PARALLELISM_THREADS']) \
              if 'TF_INTRA_OP_PARALLELISM_THREADS' in os.environ else os.cpu_count()
      infer_config.inter_op_parallelism_threads = int(os.environ['TF_INTER_OP_PARALLELISM_THREADS']) \
              if 'TF_INTER_OP_PARALLELISM_THREADS' in os.environ else os.cpu_count()
      infer_config.use_per_session_threads = 1

      graph_def = tf.compat.v1.GraphDef()
      with tf.compat.v1.gfile.FastGFile(model_path, "rb") as f:
          graph_def.ParseFromString(f.read())
          g = tf.compat.v1.import_graph_def(graph_def, name='')

      self.sess = tf.compat.v1.Session(graph=g, config=infer_config)
      return self

  def predict(self, feed):
      ans = self.sess.run(self.outputs, feed_dict=feed)
      return ans

import openvino_tensorflow as ovtf
ovtf.set_backend('CPU')
model = BackendTensorflowOpenvino(...)
model.predict({"image_tensor:0": img1})
model.predict({"image_tensor:0": img1})

The program fails at the second model.predict() call. Error Message:

./tmp-e9_0_ipe.sh: line 42: 65995 Segmentation fault      (core dumped)

Also, the above code works with the original tensorflow backend and other models ( e.g. faster_rcnn_inception_v2_coco, faster_rcnn_resnet50_coco) from TensorFlow Object Detection Model Zoo and yolo-v3.


To reproduce the error, we can use ck - an automated workflow for designing ML systems.

pip install ck

Then, pull the relevant program:

ck pull repo --url=https://github.com/krai/ck-mlperf.git
ck pull repo --url=https://github.com/krai/ck-object-detection.git

Then, follow the building instruction here. And finally, run the following command

time docker run -it --rm ${CK_IMAGE} \
"ck run program:mlperf-inference-vision --cmd_key=direct --skip_print_timers \
  --env.CK_LOADGEN_SCENARIO=SingleStream \
  --env.CK_LOADGEN_MODE='--accuracy' \
  --env.CK_LOADGEN_EXTRA_PARAMS='--count 50' \
  \
  --env.CK_MODEL_PROFILE=default_tf_object_det_zoo \
  --dep_add_tags.weights=ssd_mobilenet_v1_coco \
  \
  --env.CK_INFERENCE_ENGINE=tensorflow \
  --env.CK_INFERENCE_ENGINE_BACKEND=openvino-cpu \
  --env.CUDA_VISIBLE_DEVICES=-1"
ck-intel commented 2 years ago

@elimkwan Could you pls check this issue with our latest release 2.0.0

ck-intel commented 2 years ago

Closing it, pls reopen it if the issue is not solved with the latest release.