Error in lowering mrcnn model

ArunaKote commented 3 years ago

Added model signature to model using below code:

import tensorflow as tf
import tensorflow_hub as hub

model_url = "https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1"
loaded_model = hub.load(model_url)
call = loaded_model.__call__.get_concrete_function(
tf.TensorSpec(shape=(1, 1024, 1024, 3), dtype=tf.uint8))
signatures = {'predict': call}
tf.saved_model.save(loaded_model,'/data/aruna/maskedrcnn/iree', signatures=signatures)

Then used /data/aruna/iree/integrations/tensorflow/bazel-bin/iree_tf_compiler/iree-import-tf -tf-import-type=savedmodel_v1 -tf-savedmodel-exported-names=predict /data/aruna/maskedrcnn/iree --print-ir-before-all&>2 -o sample.mlir

:0: error: loc(callsite(callsite(callsite("BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Select@__inference___call___42606" at "StatefulPartitionedCall@__inference_restored_function_body_108980") at "StatefulPartitionedCall@__inference_signature_wrapper_110151") at "StatefulPartitionedCall")): currently unsupported operand types: 'tensor' and 'tensor<300xf32>' :0: note: loc("StatefulPartitionedCall"): called from Running iree-import-tf TF import pass pipeline failed

dinkdeep commented 3 years ago

@hanhanW please find see the attached issue as discussed on iree discord. Running iree-import-tf TF import pass pipeline failed (see diagnostics)

:0: error: loc(callsite(callsite(callsite(callsite(callsite("BiasAdd/ReadVariableOp@inference_conv4_block4_1_conv_layer_call_and_return_conditional_losses_16579" at "conv4_block4_1_conv/StatefulPartitionedCall@inference_resnet50_layer_call_and_return_conditional_losses_18751") at "StatefulPartitionedCall@inference_resnet50_layer_call_fn_19401") at "StatefulPartitionedCall@inference_restored_function_body_27470") at "StatefulPartitionedCall@__inference_signature_wrapper_28117") at "StatefulPartitionedCall")): 'tf.ReadVariableOp' op : unlegalized TensorFlow op still exists :0: note: loc("StatefulPartitionedCall"): called from :0: note: :0: error: The following Tensorflow operations still remain: tf.ReadVariableOp (count: 320) tf.VarHandleOp (count: 320)

hanhanW commented 3 years ago

I'm confused that the error messages are different.

Copied some context from discord.

ME:

I roughly looked into the log. It looks like the frontend does not handle tf.VarHandleOp lowering. We have support for tf_saved_model::GlobalTensorOp and handle tf.ReadVariableOp ops in https://github.com/google/iree/blob/main/integrations/tensorflow/iree_tf_compiler/TF/LowerGlobalTensors.cpp

We can probably support it in IREE. I don't have much context about those variable ops. I'm thinking if we can lower them to some variable ops like IREE::Util::GlobalOp. Ben probably has more thought about it? This is the dump IR from log: https://gist.github.com/hanhanW/b3ef394f71fb654c4567a6cbd6343ab2

Ben: not sure what the handle stuff is - maybe indirect globals? if so we have those in iree (util.global.load.indirect/etc) - but it may just be a tf->tf conversion to get them to the ones we support already %914 = "tf.ReadVariableOp"(%318) {device = ""} : (tensor<!tf_type.resource<tensor<2048xf32>>>) -> tensor<2048xf32> of course tf, of course that'd be util.ptr<tensor<2048xf32>> in iree land so we could probably just add the lowerings for these to the util.global.load.indirect/etc

ME: Yeah, I don't know what they are either. I guess they are variables/pointers stuff.

Ben: only thing I don't see here is how they are initialized - they are never written to if this is something with protos storing the values that'll have to all happen in tf land maybe the python is missing that magic tf.initialize_global_variables() or whatever it is https://www.tensorflow.org/api_docs/python/tf/compat/v1/global_variables_initializer I think the model needs to call that

but beyond finding that link I think stack overflow is probably the best way to track this down I think this may fall into the "if you can't run this in xla/tflite you probably can't run it in iree" territory - I doubt they know what to do with this :)

hanhanW commented 3 years ago

@dinkdeep @ArunaKote Can you check if we can run it in TFLite?

We probably want to know why do we want VarHandleOp. If there are no specific reasons, we can maybe try with GlobalTensorOp? Another questions is -- do we want to support VarHandleOp in IREE?

dinkdeep commented 3 years ago

Working on tflite to see if it can be run there. will update.

ArunaKote commented 3 years ago

tried to convert model to tflite model using : import tensorflow as tf import tensorflow_hub as hub inputs = tf.keras.Input(shape=(1024, 1024, 3),batch_size=1,dtype=tf.uint8) m_l = hub.KerasLayer("https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1") x = m_l(inputs) model = tf.keras.Model(inputs=inputs, outputs=x, name="mrcnn") converter = tf.lite.TFLiteConverter.from_keras_model(model) tflite_model = converter.convert()

got error as 2021-08-27 12:49:16.478141: F tensorflow/lite/toco/tooling_util.cc:2277] Check failed: array.data_type == array.final_data_type Array “input_1” has mis-matching actual and final data types (data_type=uint8, final_data_type=float). Fatal Python error: Aborted

dinkdeep commented 3 years ago

@ArunaKote Can't we run it with tf-opt with some option for tflite

ArunaKote commented 3 years ago

When I tried with tflite_convert --saved_model_dir=$PWD --output_file=$PWD/mrcnn.tflite 2021-08-27 15:28:09.442601: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags. 2021-08-27 15:28:30.442773: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:363] Ignored output_format. 2021-08-27 15:28:30.442826: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:366] Ignored drop_control_dependency. 2021-08-27 15:28:30.442832: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:372] Ignored change_concat_input_ranges. 2021-08-27 15:28:30.443484: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /data/aruna/maskedrcnn/iree 2021-08-27 15:28:30.575848: I tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve } 2021-08-27 15:28:30.575911: I tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /data/aruna/maskedrcnn/iree 2021-08-27 15:28:31.236096: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle. 2021-08-27 15:28:32.885071: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: /data/aruna/maskedrcnn/iree 2021-08-27 15:28:33.500229: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 3056747 microseconds. loc(callsite(callsite(callsite("CropAndResize/CropAndResize@inference__call42606" at "StatefulPartitionedCall@inference_restored_function_body_108980") at "StatefulPartitionedCall@inference_signature_wrapper_110151") at "StatefulPartitionedCall")): error: 'tf.CropAndResize' op is neither a custom op nor a flex op loc(callsite(callsite(callsite("CropAndResize_1/CropAndResize@inference__call42606" at "StatefulPartitionedCall@inference_restored_function_body_108980") at "StatefulPartitionedCall@inference_signature_wrapper_110151") at "StatefulPartitionedCall")): error: 'tf.CropAndResize' op is neither a custom op nor a flex op error: failed while converting: 'main': Some ops are not supported by the native TFLite runtime, you can enable TF kernels fallback using TF Select. See instructions: https://www.tensorflow.org/lite/guide/ops_select TF Select ops: CropAndResize Details: tf.CropAndResize(tensor<1x64x64x1088xf32>, tensor<?x4xf32>, tensor<100xi32>, tensor<2xi32>) -> (tensor<100x17x17x1088xf32>) : {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} tf.CropAndResize(tensor<1x64x64x1088xf32>, tensor<?x4xf32>, tensor<300xi32>, tensor<2xi32>) -> (tensor<300x17x17x1088xf32>) : {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} Traceback (most recent call last): File "/home/ubuntu/anaconda3/bin/tflite_convert", line 8, in sys.exit(main()) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 697, in main app.run(main=run_main, argv=sys.argv[:1]) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/absl/app.py", line 303, in run _run_main(main, args) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 680, in run_main _convert_tf2_model(tflite_flags) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/tflite_convert.py", line 295, in _convert_tf2_model tflite_model = converter.convert() File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 766, in wrapper return self._convert_and_export_metrics(convert_func, *args, kwargs) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 752, in _convert_and_export_metrics result = convert_func(self, args, kwargs) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/lite.py", line 1032, in convert result = _convert_saved_model(converter_kwargs) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/convert_phase.py", line 223, in wrapper raise converter_error from None # Re-throws the exception. File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/convert_phase.py", line 216, in wrapper return func(args, kwargs) File "/home/ubuntu/dkd/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/convert.py", line 852, in convert_saved_model enable_mlir_converter=True) File "/home/ubuntu/anaconda3/lib/python3.7/site-packages/tensorflow/lite/python/convert.py", line 315, in toco_convert_protos raise converter_error tensorflow.lite.python.convert_phase.ConverterError: :0: error: loc(callsite(callsite(callsite("CropAndResize/CropAndResize@inference__call42606" at "StatefulPartitionedCall@inference_restored_function_body_108980") at "StatefulPartitionedCall@inference_signature_wrapper_110151") at "StatefulPartitionedCall")): 'tf.CropAndResize' op is neither a custom op nor a flex op

:0: note: loc("StatefulPartitionedCall"): called from :0: note: loc(callsite(callsite(callsite("CropAndResize/CropAndResize@__inference___call___42606" at "StatefulPartitionedCall@__inference_restored_function_body_108980") at "StatefulPartitionedCall@__inference_signature_wrapper_110151") at "StatefulPartitionedCall")): Error code: ERROR_NEEDS_FLEX_OPS :0: error: loc(callsite(callsite(callsite("CropAndResize_1/CropAndResize@__inference___call___42606" at "StatefulPartitionedCall@__inference_restored_function_body_108980") at "StatefulPartitionedCall@__inference_signature_wrapper_110151") at "StatefulPartitionedCall")): 'tf.CropAndResize' op is neither a custom op nor a flex op :0: note: loc("StatefulPartitionedCall"): called from :0: note: loc(callsite(callsite(callsite("CropAndResize_1/CropAndResize@__inference___call___42606" at "StatefulPartitionedCall@__inference_restored_function_body_108980") at "StatefulPartitionedCall@__inference_signature_wrapper_110151") at "StatefulPartitionedCall")): Error code: ERROR_NEEDS_FLEX_OPS :0: error: failed while converting: 'main': Some ops are not supported by the native TFLite runtime, you can enable TF kernels fallback using TF Select. See instructions: https://www.tensorflow.org/lite/guide/ops_select TF Select ops: CropAndResize Details: tf.CropAndResize(tensor<1x64x64x1088xf32>, tensor, tensor<100xi32>, tensor<2xi32>) -> (tensor<100x17x17x1088xf32>) : {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} tf.CropAndResize(tensor<1x64x64x1088xf32>, tensor, tensor<300xi32>, tensor<2xi32>) -> (tensor<300x17x17x1088xf32>) : {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"}

dinkdeep commented 3 years ago

confirmed that tflite is failing with "tf.CropAndResize" error as Aruna reported

(base) ddeepani@hb-dt-d-deepani:/data/Work/tf/tensorflow$ bazel-bin/tensorflow/lite/python/tflite_convert --saved_model_dir=/data/Work/dkd/ --output_file=rccnh5.tflite 2021-08-27 15:53:49.949373: I tensorflow/core/platform/cpu_feature_guard.cc:151] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX AVX2 FMA To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.

2021-08-27 15:54:08.252437: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:363] Ignored output_format. 2021-08-27 15:54:08.252466: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:366] Ignored drop_control_dependency. 2021-08-27 15:54:08.252471: W tensorflow/compiler/mlir/lite/python/tf_tfl_flatbuffer_helpers.cc:372] Ignored change_concat_input_ranges. 2021-08-27 15:54:08.253019: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /data/Work/dkd/ 2021-08-27 15:54:08.357137: I tensorflow/cc/saved_model/reader.cc:107] Reading meta graph with tags { serve } 2021-08-27 15:54:08.357170: I tensorflow/cc/saved_model/reader.cc:148] Reading SavedModel debug info (if present) from: /data/Work/dkd/ 2021-08-27 15:54:08.818435: I tensorflow/cc/saved_model/loader.cc:210] Restoring SavedModel bundle. 2021-08-27 15:54:10.145520: I tensorflow/cc/saved_model/loader.cc:194] Running initialization op on SavedModel bundle at path: /data/Work/dkd/ 2021-08-27 15:54:10.608905: I tensorflow/cc/saved_model/loader.cc:283] SavedModel load for tags { serve }; Status: success: OK. Took 2355885 microseconds. loc(callsite(callsite(callsite("CropAndResize/CropAndResize@inference__call42606" at "StatefulPartitionedCall@inference_restored_function_body_108980") at "StatefulPartitionedCall@inference_signature_wrapper_110151") at "StatefulPartitionedCall")): error: 'tf.CropAndResize' op is neither a custom op nor a flex op loc(callsite(callsite(callsite("CropAndResize_1/CropAndResize@inference__call42606" at "StatefulPartitionedCall@inference_restored_function_body_108980") at "StatefulPartitionedCall@inference_signature_wrapper_110151") at "StatefulPartitionedCall")): error: 'tf.CropAndResize' op is neither a custom op nor a flex op error: failed while converting: 'main': Some ops are not supported by the native TFLite runtime, you can enable TF kernels fallback using TF Select. See instructions: https://www.tensorflow.org/lite/guide/ops_select TF Select ops: CropAndResize Details: tf.CropAndResize(tensor<1x64x64x1088xf32>, tensor<?x4xf32>, tensor<100xi32>, tensor<2xi32>) -> (tensor<100x17x17x1088xf32>) : {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} tf.CropAndResize(tensor<1x64x64x1088xf32>, tensor<?x4xf32>, tensor<300xi32>, tensor<2xi32>) -> (tensor<300x17x17x1088xf32>) : {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"}

Traceback (most recent call last): File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/tflite_convert.py", line 701, in main() File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/tflite_convert.py", line 697, in main app.run(main=run_main, argv=sys.argv[:1]) File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/absl_py/absl/app.py", line 300, in run _run_main(main, args) File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/absl_py/absl/app.py", line 251, in _run_main sys.exit(main(argv)) File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/tflite_convert.py", line 680, in run_main _convert_tf2_model(tflite_flags) File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/tflite_convert.py", line 295, in _convert_tf2_model tflite_model = converter.convert() File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/lite.py", line 766, in wrapper return self._convert_and_export_metrics(convert_func, *args, kwargs) File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/lite.py", line 752, in _convert_and_export_metrics result = convert_func(self, args, kwargs) File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/lite.py", line 1032, in convert result = _convert_saved_model(converter_kwargs) File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/convert_phase.py", line 223, in wrapper raise converter_error from None # Re-throws the exception. File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/convert_phase.py", line 216, in wrapper return func(args, kwargs) File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/convert.py", line 852, in convert_saved_model enable_mlir_converter=True) File "/data/Work/tf/tensorflow/bazel-bin/tensorflow/lite/python/tflite_convert.runfiles/org_tensorflow/tensorflow/lite/python/convert.py", line 315, in toco_convert_protos raise converter_error tensorflow.lite.python.convert_phase.ConverterError: :0: error: loc(callsite(callsite(callsite("CropAndResize/CropAndResize@inference__call42606" at "StatefulPartitionedCall@inference_restored_function_body_108980") at "StatefulPartitionedCall@inference_signature_wrapper_110151") at "StatefulPartitionedCall")): 'tf.CropAndResize' op is neither a custom op nor a flex op

:0: note: loc("StatefulPartitionedCall"): called from :0: note: loc(callsite(callsite(callsite("CropAndResize/CropAndResize@__inference___call___42606" at "StatefulPartitionedCall@__inference_restored_function_body_108980") at "StatefulPartitionedCall@__inference_signature_wrapper_110151") at "StatefulPartitionedCall")): Error code: ERROR_NEEDS_FLEX_OPS :0: error: loc(callsite(callsite(callsite("CropAndResize_1/CropAndResize@__inference___call___42606" at "StatefulPartitionedCall@__inference_restored_function_body_108980") at "StatefulPartitionedCall@__inference_signature_wrapper_110151") at "StatefulPartitionedCall")): 'tf.CropAndResize' op is neither a custom op nor a flex op :0: note: loc("StatefulPartitionedCall"): called from :0: note: loc(callsite(callsite(callsite("CropAndResize_1/CropAndResize@__inference___call___42606" at "StatefulPartitionedCall@__inference_restored_function_body_108980") at "StatefulPartitionedCall@__inference_signature_wrapper_110151") at "StatefulPartitionedCall")): Error code: ERROR_NEEDS_FLEX_OPS :0: error: failed while converting: 'main': Some ops are not supported by the native TFLite runtime, you can enable TF kernels fallback using TF Select. See instructions: https://www.tensorflow.org/lite/guide/ops_select TF Select ops: CropAndResize Details: tf.CropAndResize(tensor<1x64x64x1088xf32>, tensor, tensor<100xi32>, tensor<2xi32>) -> (tensor<100x17x17x1088xf32>) : {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} tf.CropAndResize(tensor<1x64x64x1088xf32>, tensor, tensor<300xi32>, tensor<2xi32>) -> (tensor<300x17x17x1088xf32>) : {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} (base) ddeepani@hb-dt-d-deepani:/data/Work/tf/tensorflow$ cksum /data/Work/dkd/saved_model.pb 1389938488 15909338 /data/Work/dkd/saved_model.pb

dinkdeep commented 3 years ago

@hanhanW "I'm confused that the error messages are different." Han indeed there are 2 issues one is generated from a saved_model.pb of 3.6 MB size the one with

(base) ddeepani@hb-dt-d-deepani:~/Work/dkd/misc/mask_rccn/Mask-RCNN-TF2$ ls -lthr /data/Work/dkd/saved_model.pb -rw-rw-r-- 1 ddeepani ddeepani 3.6M Aug 26 13:48 /data/Work/dkd/saved_model.pb

:0: error: The following Tensorflow operations still remain: tf.ReadVariableOp (count: 320) tf.VarHandleOp (count: 320) in the VerifyFullyConvertedPass and other is with the 16 MB file downloaded from the https://tfhub.dev/tensorflow/mask_rcnn/inception_resnet_v2_1024x1024/1 :0: error: loc(callsite(callsite(callsite("BatchMultiClassNonMaxSuppression/MultiClassNonMaxSuppression/Select@__inference___call___42606" at "StatefulPartitionedCall@__inference_restored_function_body_108980") at "StatefulPartitionedCall@__inference_signature_wrapper_110151") at "StatefulPartitionedCall")): currently unsupported operand types: 'tensor' and 'tensor<300xf32>' :0: note: loc("StatefulPartitionedCall"): called from // -----// IR Dump After mlir::iree_integrations::TF::ConvertToMHLOPass Failed //----- // Any how both are legit issues . After fixing the VarHandleOp may have to go for BatchMultiClassNonMaxSuppression .

dinkdeep commented 3 years ago

I also see these 6 tf Ops after the ConvertToMHLOPass which might need the equivalent HLO conversions proably

// -----// IR Dump After mlir::iree_integrations::TF::ConvertToMHLOPass Failed //----- // tf.CropAndResize tf.TopKV2 tf.Where tf.NonMaxSuppressionV5 tf.ResizeBilinear tf.Tile

dinkdeep commented 3 years ago

I am looking at these one by one but looks like tf.ResizeBilinear/tf.NonMaxSuppressionV5 and tf.CropAndResize could be major ones,

Actually started with tf.CropAndResize 1) created a python equivalent code to use the python API ( tf.image.crop_and_resize) tf.image.crop_and_resize | TensorFlow Core v2.6.0 to get tf dilaect operator 2) Created a saved_model.pb 3) Ran the saved model with tflite_convert . 4 Dumped the tf and tfl dialects using module.dump() and extracted the both these dialects on a txt file tfllie_convert failed with error converting tf.CropAndResize to equivalent tfl /data/Work/tf/tensorflow

time TF_DUMP_GRAPH_PREFIX=crop TF_CPP_MAX_VLOG_LEVEL=4 bazel-bin/tensorflow/lite/python/tflite_convert --saved_model_dir=/data/Work/dkd/tmp42f5h1vo/module_with_signature/ --output_file=crop.tflite &>log.crop

tensorflow/compiler/mlir/lite/flatbuffer_export.cc:406] GetTensorFlowNodeDef %0 = "tf.CropAndResize"(%arg0, %arg1, %arg2, %arg3) {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} : (tensor<?x?x?x?xf32>, tensor<?x4xf32>, tensor<?xi32>, tensor<2xi32>) -> tensor<?x10x10x?xf32> loc(callsite(callsite("CropAndResize@inference_call_43" at "PartitionedCall@inference_signature_wrapper_50") at "PartitionedCall")): error: 'tf.CropAndResize' op is neither a custom op nor a flex op 2021-08-30 12:42:35.450948: I tensorflow/compiler/mlir/lite/flatbuffer_export.cc:1883] resourceops 0 error: failed while converting: 'main': Some ops are not supported by the native TFLite runtime, you can enable TF kernels fallback using TF Select. See instructions: Select TensorFlow operators | TensorFlow Lite TF Select ops: CropAndResize, Placeholder Details: tf.CropAndResize(tensor<?x?x?x?xf32>, tensor<?x4xf32>, tensor<?xi32>, tensor<2xi32>) -> (tensor<?x10x10x?xf32>) : {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} tf.Placeholder() -> (tensor<?x?x4xf32>) : {device = "", shape = #tf_type.shape<?x?x4>} tf.Placeholder() -> (tensor<?x?x?x?xf32>) : {device = "", shape = #tf_type.shape<?x?x?x?>}

tf-opt too failed while legalizing /data/dkd/tf_linalg/tensorflow/bazel-bin/tensorflow/compiler/mlir/tf-opt -tf-executor-island-coarsening -canonicalize --tf-device-decompose-resource-ops-in-cluster --tf-promote-var-handles-to-args --tf-readonly-references-to-resources --tf-resource-device-inference --tf-executor-to-functional-conversion --tf-shape-inference -xla-legalize-tf croptf.mlir -print-ir-before-all --mlir-disable-threading -print-ir-after-all &>log.cropResize crop

tf.mlir:11:3: error: The following operations cannot be legalized: tf.CropAndResize (count: 1); tf.Placeholder (count: 2). These legalization failure(s) may be due to missing TF to HLO lowerings and/or unsupported attributes, etc. builtin.func private @inference_call_430(%arg0: tensor<2x200x200x3xui32> {tf._user_specified_name = "x"}) -> tensor<*xf32> attributes {tf._construction_context = "kEagerRuntime", tf._input_shapes = [#tf_type.shape<2x200x200x3>]} {\ ^ crop_tf.mlir:11:3: error: Emitting more detail about one op that failed to legalize... builtin.func private @inferencecall430(%arg0: tensor<2x200x200x3xui32> {tf._user_specified_name = "x"}) -> tensor attributes {tf._construction_context = "kEagerRuntime", tf._input_shapes = [#tf_type.shape<2x200x200x3>]} { ^ crop_tf.mlir:38:11: error: 'tf.CropAndResize' op is not legalizable %12 = "tf.CropAndResize"(%0, %4, %11, %cst) {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} : (tensor<?x?x?x?xf32>, tensor, tensor, tensor<2xi32>) -> tensor crop_tf.mlir:38:11: note: see current operation: %71 = "tf.CropAndResize"(%9, %18, %70, %8) {T = f32, device = "", extrapolation_value = 0.000000e+00 : f32, method = "bilinear"} : (tensor<?x?x?x?xf32>, tensor<?x4xf32>, tensor<?xi32>, tensor<2xi32>) -> tensor<?x10x10x?xf32> // -----// IR Dump After LegalizeTF Failed //----- /(edited)

Will try with iree-import-tf now.

hanhanW commented 3 years ago

It's a bit hard to parse contexts. Could you organize the error log better with a link or a code block?

If there are two models, I think it's better to file a bug for each one. It's easier to us to track the missing work. :)

Can you check if we can run it in TFLite? Sorry that I did not point it out directly. I'm not familiar with TFLite MLIR path. I meant to run TFLite converter and inference (like benchmark_tool?) on the models.

I also see these 6 tf Ops after the ConvertToMHLOPass which might need the equivalent HLO conversions proably

You can try to add some patterns to https://github.com/tensorflow/tensorflow/blob/800d21c92441c17a9ae521adcff62d2ce0640914/tensorflow/compiler/mlir/xla/transforms/legalize_tf.cc

I remembered that we have TopKV2 lowering, but we could miss some cases.

dinkdeep commented 3 years ago

We will create separate issues and attach the log files and paste only the relevant error messages in the console .

Regarding TFLite converter we did run the model with it and found that it was not able to lower the tf.CropAndResize(tensor<1x64x64x1088xf32>, tensor<?x4xf32>, tensor<100xi32>, tensor<2xi32>) to an equivalent tfl Operator , We will reattach the error log for it .

Please suggest the name of the benchmark_tool we can include it too.

Sure we will try to see if we can add some patterns for the ops, in case @benvanik might sign up for tf.NonMaxSuppressionV5 as discussed over iree discord .

dinkdeep commented 3 years ago

@ArunaKote Please attach the error log files and create separate issues .

ArunaKote commented 3 years ago

Ok

iree-org / iree

Error in lowering mrcnn model #6893