tensorflow / models

Models and examples built with TensorFlow
Other
76.97k stars 45.79k forks source link

Converted TFLite FPN-SSD model crashes #6709

Open kjmiller013 opened 5 years ago

kjmiller013 commented 5 years ago

System information

Describe the problem

Summary: When I convert any FPN-SSD model to .tflite, the .tflite model crashes upon calling interpreter.invoke().

Steps to reproduce: I downloaded the ssd_resnet_50_fpn_coco model from the Tensorflow detection model zoo I followed the instructions for Running on Mobile with TensorFlow Lite to convert the pretrained checkpoint to a floating-point model. Specifically, I used the commands:

object_detection/export_tflite_ssd_graph.py \
--pipeline_config_path=$CONFIG_FILE \
--trained_checkpoint_prefix=$CHECKPOINT_PATH \
--output_directory=$OUTPUT_DIR \
--add_postprocessing_op=true
bazel run --config=opt tensorflow/lite/toco:toco -- \
--input_file=$OUTPUT_DIR/tflite_graph.pb \
--output_file=$OUTPUT_DIR/detect.tflite \
--input_shapes=1,640,640,3 \
--input_arrays=normalized_input_image_tensor \
--output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3'  \
--inference_type=FLOAT \
--allow_custom_ops

I then ran the following python code (taken from here), with TFLITE_FILENAME being defined as $OUTPUT_DIR/detect.tflite:

import numpy as np
import tensorflow as tf

# Load TFLite model and allocate tensors.
interpreter = tf.lite.Interpreter(model_path=TFLITE_FILENAME)
interpreter.allocate_tensors()

# Get input and output tensors.
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

# Test model on random input data.
input_shape = input_details[0]['shape']
input_data = np.array(np.random.random_sample(input_shape), dtype=np.float32)
interpreter.set_tensor(input_details[0]['index'], input_data)

interpreter.invoke()

The last line results in an Aborted error. I likewise got a SIGABRT error when I tried to use this *.tflite model in the example iOS app from here. I got similar behavior with the ssd_mobilenet_v1_fpn_coco model from the zoo. On the other hand, ssd_mobilenet_v1_0.75_depth_coco and ssd_inception_v2_coco worked just fine. I haven’t yet tested the other models from the zoo.

These problems are happening with pretrained checkpoints straight from the zoo (i.e. I haven’t touched them at all). I’m wondering if I’m doing something wrong with the conversion commands, or if the TFLite converter doesn’t actually support SSD models that use FPN. Can you please advise? Thanks!

vignesh8491 commented 5 years ago

I'm getting the same error with ssd_resnet_50_fpn_coco model although i trained with my own dataset. The model was working perfectly with the frozen inference graph. Please help to get this model working on tflite.

thusinh1969 commented 5 years ago

IOS Tflite requires quantized only model, which is still buggy with fpn-based network :(

rehmanzafar commented 5 years ago

I am getting the same error with ssd_mobilenet_v1_coco, ssd_mobilenet_v1_fpn_coco and ssd_resnet50_v1_fpn_coco.

ryanjay0 commented 4 years ago

It's nice to see there's progress on this: assigning someone new to solve it. Please post something here if you figure it out. It's a shame your fpn checkpoints from the model zoo don't work.

srjoglekar246 commented 4 years ago

Hello folks, sorry about the long wait time on this one :-). Weirdly, ssd_resnet_50_fpn_coco converts and runs fine for me. The only difference I see between what I did and the commands posted above, is that I added --input_format=TENSORFLOW_GRAPHDEF --output_format=TFLITE to the TOCO command. Could you retry and check? If problems do persist, could you link your converted model for me to take a deeper look?

Also, SSD with a ResNet50 trunk isn't really optimized for on-device inference, so it might make more sense to do this with the Mobilenet versions mentioned by @rehmanzafar

ryanjay0 commented 4 years ago

Thank you so much for looking into this. Unfortunately we are unable to get this to run properly on Linux or iOS. There is no error in conversion but it gives the Aborted error on Linux and the SIGABRT using the ios app demo framework exactly as detailed above in the original question. It does run correctly on Android only.

We followed the steps from Running on Mobile with Tensorflow Lite.

Here are both conversion steps exactly as we ran them:

python object_detection/export_tflite_ssd_graph.py \
    --pipeline_config_path /path/pipeline.config \
    --trained_checkpoint_prefix /path/model.ckpt \
    --output_directory /path \
    --add_postprocessing_op=true

and

tflite_convert \   # we also try toco \
  --graph_def_file=$OUTPUT_DIR/tflite_graph.pb \
  --output_file=$OUTPUT_DIR/detect.tflite \
  --input_format=TENSORFLOW_GRAPHDEF \
  --input_shapes=1,640,640,3 \
  --input_arrays=normalized_input_image_tensor \
  --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3'  \
  --output_format=TFLITE \
  --inference_type=FLOAT \
  --allow_custom_ops 

We tried toco (the command line version not bazel) and tflite_convert. We tried tf 1.14, 1.15rc2, and 1.15rc3. All versions give the same error. We used various iOS devices including both new and old tablets/phones. We tried using resnet_50 and mobilenet v1/v2 using the model zoo ckpts. We also tried tf 2.0 and 2.1 but we get:

from tensorflow.tools.graph_transforms import TransformGraph
ImportError: No module named graph_transforms

when trying to run export_tflite_ssd_graph.py.

If you could please provide more information about what exactly runs fine for you and how you got it to work, especially on iOS, that would really be useful.

Here's an example tflite file converted from ssd_mobilenet_v1_fpn_coco:

https://drive.google.com/file/d/1hhqHoLZwMu08OHbD-5A4LjDTBNvFNm7x/view?usp=sharing

jdduke commented 4 years ago

There is no error in conversion but it gives the Aborted error on Linux

How are you testing on Linux?

the SIGABRT using the ios app demo framework exactly as detailed above in the original question.

Is there anything else in the logs? Are you adjusting the app to accommodate the 640x640 input size in your graph (note that this is different than the image size used by default in that app, which is 300x300). You'll want to modify the code accordingly.

ryanjay0 commented 4 years ago

We are testing on Linux using the script posted above in the original question. It works when using models from the zoo that don't have FPN such as ssd_mobilenet_v1_coco. Here's some example output:

INFO: Initialized TensorFlow Lite runtime.
[[[ 0.99805546  0.63043827  1.0001388   0.7015893 ]
  [ 0.998252    0.803046    1.0003759   0.89067185]
  [ 0.9978381   0.36708647  1.0001876   0.45205355]
  ...
  [ 0.39218345  0.00445052  0.42441067  0.11782499]
  [ 0.5818281  -0.01217696  0.7814739   0.5513865 ]
  [ 0.46115616  0.50543225  0.49874964  0.59435725]]]

Of course the point of using tflite here is for iOS so i'm mainly focused on that failure. The specific error output is shown in the attached image. image

We have numerous other non-fpn custom models running in tflite on iOS with 640x640 (or larger) input sizes and so i believe we set the input size correctly in the app. Thanks again.

jdduke commented 4 years ago

I was unable to repro the Linux failure, we'll take a closer look at the iOS side of things.

For the iOS code, were you building from source? Or using one of our released CocoaPods? If the latter, which version?

ryanjay0 commented 4 years ago

We used the released CocoaPods. We tried versions 1.14.0, 1.15.0, and 2.0.0. All give the same error.

yyoon commented 4 years ago

Thanks for the info. I'll try to reproduce this on iOS and keep this thread updated.

yyoon commented 4 years ago

When I tried converting the model myself following the instructions above, I could successfully run the ssd_mobilenet_v1_fpn_coco model without the crash issue. I tested with the iOS object_detection example here, using TFLite 2.0.0 CocoaPods, and could successfully run the converted model by putting it under ObjectDetection/Model directory. (the inference was very slow on iPhone XS, but that's expected I think.)

That said, when I downloaded the converted model linked above in https://github.com/tensorflow/models/issues/6709#issuecomment-543432460, I could reproduce the crash. This was reproducible with the python script in the OP on my side.

I took a quick look at the two graphs using a graph visualizer (netron specifically), and they look identical to me. I'll investigate further to see how they differ at flatbuffer level, and why they might be different.

amalF commented 4 years ago

Hello, Any updates on this issue? I have the same issue running SSD mobileNet resnet50 FPN on PI4. Thanks

ZionYuan commented 4 years ago

Thank you so much for looking into this. Unfortunately we are unable to get this to run properly on Linux or iOS. There is no error in conversion but it gives the Aborted error on Linux and the SIGABRT using the ios app demo framework exactly as detailed above in the original question. It does run correctly on Android only.

We followed the steps from Running on Mobile with Tensorflow Lite.

Here are both conversion steps exactly as we ran them:

python object_detection/export_tflite_ssd_graph.py \
    --pipeline_config_path /path/pipeline.config \
    --trained_checkpoint_prefix /path/model.ckpt \
    --output_directory /path \
    --add_postprocessing_op=true

and

tflite_convert \   # we also try toco \
  --graph_def_file=$OUTPUT_DIR/tflite_graph.pb \
  --output_file=$OUTPUT_DIR/detect.tflite \
  --input_format=TENSORFLOW_GRAPHDEF \
  --input_shapes=1,640,640,3 \
  --input_arrays=normalized_input_image_tensor \
  --output_arrays='TFLite_Detection_PostProcess','TFLite_Detection_PostProcess:1','TFLite_Detection_PostProcess:2','TFLite_Detection_PostProcess:3'  \
  --output_format=TFLITE \
  --inference_type=FLOAT \
  --allow_custom_ops 

We tried toco (the command line version not bazel) and tflite_convert. We tried tf 1.14, 1.15rc2, and 1.15rc3. All versions give the same error. We used various iOS devices including both new and old tablets/phones. We tried using resnet_50 and mobilenet v1/v2 using the model zoo ckpts. We also tried tf 2.0 and 2.1 but we get:

from tensorflow.tools.graph_transforms import TransformGraph
ImportError: No module named graph_transforms

when trying to run export_tflite_ssd_graph.py.

If you could please provide more information about what exactly runs fine for you and how you got it to work, especially on iOS, that would really be useful.

Here's an example tflite file converted from ssd_mobilenet_v1_fpn_coco:

https://drive.google.com/file/d/1hhqHoLZwMu08OHbD-5A4LjDTBNvFNm7x/view?usp=sharing

Hi,I want to know how to deploy the .lite file in android ,could you give me some instructions. I was use the object detection demo from tensorflow lite github,but when I import the lite file in assests , the app crashed. So, I am forward to your suggestions about this. :)

srjoglekar246 commented 4 years ago

@yzmean Can you open a new Github issue with the exact error? If you used the code/model from our Object Detection example app, things should ideally work. So it would be good to know what the behavior on your end is :-)

ZionYuan commented 4 years ago

@yzmean Can you open a new Github issue with the exact error? If you used the code/model from our Object Detection example app, things should ideally work. So it would be good to know what the behavior on your end is :-)

Thanks for your reply :), I have open a new issue in #7870 .

eneshb commented 4 years ago

Hey, did anybody find a workaround to get ssd fpn model work on tensorflow?

nimishpatel19 commented 3 years ago

We are testing on Linux using the script posted above in the original question. It works when using models from the zoo that don't have FPN such as ssd_mobilenet_v1_coco. Here's some example output:

INFO: Initialized TensorFlow Lite runtime.
[[[ 0.99805546  0.63043827  1.0001388   0.7015893 ]
  [ 0.998252    0.803046    1.0003759   0.89067185]
  [ 0.9978381   0.36708647  1.0001876   0.45205355]
  ...
  [ 0.39218345  0.00445052  0.42441067  0.11782499]
  [ 0.5818281  -0.01217696  0.7814739   0.5513865 ]
  [ 0.46115616  0.50543225  0.49874964  0.59435725]]]

Of course the point of using tflite here is for iOS so i'm mainly focused on that failure. The specific error output is shown in the attached image. image

We have numerous other non-fpn custom models running in tflite on iOS with 640x640 (or larger) input sizes and so i believe we set the input size correctly in the app. Thanks again.

did you find any solution here?

srjoglekar246 commented 3 years ago

Hey folks, TBH the TF1 detection zoo isn't well-maintained anymore, can you elaborate on why you require the FPN model specifically? Also, which device/hardware are you targetting?

Petros626 commented 2 years ago

The thing is, that TF2 does not contain or allows to have this in the config file for training: graph_rewriter { quantization { delay: 48000 weight_bits: 8 activation_bits: 8 } So there some repos, where people reported, that they converted succesfully Faster R-CNN or YOLO to .tflite, but it's not officially confirmed by TensorFlow. So if you train these models, I would try to use one of the custom converters provided by the users. I wonder how Google Coral was able to use the new model family with .tflite and logically with the EdgeTPU compiler.