iree-org / iree

A retargetable MLIR-based machine learning compiler and runtime toolkit.
http://iree.dev/
Apache License 2.0
2.58k stars 577 forks source link

Error importing TF T5-base #12259

Open mariecwhite opened 1 year ago

mariecwhite commented 1 year ago

What happened?

Error when importing the TF T5-base model to MLIR using iree-import-tf:

(iree.venv) mariewhite@marie:~/github/SHARK$ iree-import-tf /tmp/t5-base --tf-import-type=savedmodel_v2 --tf-savedmodel-exported-names=forward --tf-savedmodel-tags= -o=/tmp/t5-base_tf.mlir --output-format=mlir-ir --mlir-print-debuginfo
2023-02-16 19:58:34.470834: I external/org_tensorflow/tensorflow/cc/saved_model/bundle_v2.cc:44] Reading SavedModel from: /tmp/t5-base
2023-02-16 19:58:34.653575: I external/org_tensorflow/tensorflow/cc/saved_model/reader.cc:130] Reading SavedModel debug info (if present) from: /tmp/t5-base
2023-02-16 19:58:42.863590: I external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: SSE3 SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/autograph/operators/control_flow.py:1407:0: error: could not lower resource op cf.cond_br
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/autograph/operators/control_flow.py:1361:0: note: called from
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/transformers/generation/tf_utils.py:1452:0: note: called from
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/transformers/generation/tf_utils.py:775:0: note: called from
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/autograph/operators/control_flow.py:1416:0: note: called from
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/autograph/operators/control_flow.py:1363:0: note: called from
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/transformers/generation/tf_utils.py:760:0: note: called from
/usr/local/google/home/mariewhite/github/SHARK/tank/model_utils_tf.py:191:0: note: called from
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/framework/func_graph.py:1222:0: note: called from
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/eager/function.py:3317:0: note: called from
/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/autograph/operators/control_flow.py:1407:0: note: see current operation: "cf.cond_br"(%1894, %17, %8, %1890, %1891, %9, %arg1, %24, %771, %789, %805, %811, %902, %920, %936, %942, %995, %1013, %1029, %1035, %1088, %1106, %1122, %1128, %1181, %1199, %1215, %1221, %1274, %1292, %1308, %1314, %1367, %1385, %1401, %1407, %1460, %1478, %1494, %1500, %1553, %1571, %1587, %1593, %1646, %1664, %1680, %1686, %1737, %1752, %866, %872, %1811, %1826, %885, %891, %arg2, %arg101, %arg102, %arg103, %arg104, %arg105, %arg106, %arg107, %arg108, %arg111, %arg112, %arg113, %arg114, %arg115, %arg116, %arg117, %arg118, %arg119, %arg120, %arg121, %arg124, %arg125, %arg126, %arg127, %arg128, %arg129, %arg130, %arg131, %arg132, %arg133, %arg134, %arg137, %arg138, %arg139, %arg140, %arg141, %arg142, %arg143, %arg144, %arg145, %arg146, %arg147, %arg150, %arg151, %arg152, %arg153, %arg154, %arg155, %arg156, %arg157, %arg158, %arg159, %arg160, %arg163, %arg164, %arg165, %arg166, %arg167, %arg168, %arg169, %arg170, %arg171, %arg172, %arg173, %arg176, %arg177, %arg178, %arg179, %arg180, %arg181, %arg182, %arg183, %arg184, %arg185, %arg186, %arg189, %arg190, %arg191, %arg192, %arg193, %arg194, %arg195, %arg196, %arg197, %arg198, %arg199, %arg202, %arg203, %arg204, %arg205, %arg206, %arg207, %arg208, %arg209, %arg210, %arg211, %arg212, %arg215, %arg216, %arg217, %arg218, %arg219, %arg220, %arg221, %arg222, %arg223, %arg224, %arg225, %arg228, %arg229, %arg230, %arg231, %arg232, %arg233, %arg234, %arg235, %arg236, %arg237, %arg238, %arg241, %arg242, %arg243, %arg244, %arg245, %arg246, %arg247, %arg248, %arg249, %arg250, %arg251, %arg254, %arg255, %arg256, %arg257, %arg258, %1890)[^bb1, ^bb3] {operand_segment_sizes = array<i32: 1, 190, 1>} : (i1, tensor<i32>, tensor<i32>, tensor<1x513xi32>, tensor<1xi1>, tensor<i32>, tensor<1x512xi32>, tensor<1x512xi32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x511x64xf32>, tensor<1x12x512x64xf32>, tensor<1x12x512x64xf32>, tensor<!tf_type.resource<tensor<32128x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<32x12xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<!tf_type.resource<tensor<768x3072xf32>>>, tensor<!tf_type.resource<tensor<3072x768xf32>>>, tensor<!tf_type.resource<tensor<768xf32>>>, tensor<1x513xi32>) -> () loc(fused["If:", callsite("cond@__inference_forward_21705"("/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/autograph/operators/control_flow.py":1407:0) at callsite("/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/autograph/operators/control_flow.py":1361:0 at callsite("/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/transformers/generation/tf_utils.py":1452:0 at callsite("/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/transformers/generation/tf_utils.py":775:0 at callsite("/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/autograph/operators/control_flow.py":1416:0 at callsite("/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/autograph/operators/control_flow.py":1363:0 at callsite("/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/transformers/generation/tf_utils.py":760:0 at callsite("/usr/local/google/home/mariewhite/github/SHARK/tank/model_utils_tf.py":191:0 at callsite("/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/framework/func_graph.py":1222:0 at "/usr/local/google/home/mariewhite/github/SHARK/iree.venv/lib/python3.10/site-packages/tensorflow/python/eager/function.py":3317:0)))))))))])

Steps to reproduce your issue

  1. Download the SavedModel at https://storage.googleapis.com/iree-model-artifacts/t5-base-tf-model.tar.gz
  2. Run iree-import-tf /tmp/t5-base --tf-import-type=savedmodel_v2 --tf-savedmodel-exported-names=forward --tf-savedmodel-tags= -o=/tmp/t5-base_tf.mlir --output-format=mlir-ir --mlir-print-debuginfo

What component(s) does this issue relate to?

No response

Version information

No response

Additional context

No response

jpienaar commented 1 year ago

There should not be a cond.br at the point where there still are resources. Could you enable reproducers here and add that to the bug?

mariecwhite commented 1 year ago

Reproducer output: https://storage.googleapis.com/iree-shared-files/tf-base-core-reproducer.mlir

jpienaar commented 1 year ago

Running the reproducer I get

error: 'mhlo.convert' op all non-scalar operands/results must have the same shape and base type
...
note: see current operation: %1854 = "mhlo.convert"(%1849) : (tensor<*xi1>) -> tensor<1xi1>

I'll try resyncing to see if there was some difference with head.

jpienaar commented 1 year ago

Same error at head. Could you try creating a local reproducer with --mlir-elide-elementsattrs-if-larger=16 (as these constants are just too big and I don't think contributing) and then test with iree-tf-opt if the above error reproduces? Then we'd have something narrowed down to that pass.