onnx / onnx-tensorflow

Tensorflow Backend for ONNX
Other
1.27k stars 296 forks source link

ValueError: Cannot take the length of shape with unknown rank. #1034

Open jyh2378 opened 2 years ago

jyh2378 commented 2 years ago

I tried to convert pytorch RAFT model to tensorflow model via onnx. However, I got an error message like below:

2022-05-26 00:46:27.068698: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.069166: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.124187: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.124606: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.191163: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.191219: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.192358: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.192422: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.496351: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.496785: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.550957: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.551368: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.617441: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.617606: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.618700: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:27.618794: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.186394: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.186838: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.205382: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.205446: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.205483: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.207528: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.207585: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.207618: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.213740: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.213935: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.213973: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.216111: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.216209: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
2022-05-26 00:46:28.216244: W tensorflow/core/framework/op_kernel.cc:1745] OP_REQUIRES failed at functional_ops.cc:373 : INTERNAL: No function library
Traceback (most recent call last):
  File "/home1/irteam/users/yeonghwa.jin/tf-raft/convert_model.py", line 38, in <module>
    convert_onnx_to_tf("../RAFT/models/raft-small-gpu.onnx")
  File "/home1/irteam/users/yeonghwa.jin/tf-raft/convert_model.py", line 33, in convert_onnx_to_tf
    tf_rep.export_graph(os.path.join("models", tf_model_path))
  File "/home1/irteam/users/yeonghwa.jin/tf-raft/onnx-tensorflow/onnx_tf/backend_rep.py", line 143, in export_graph
    signatures=self.tf_module.__call__.get_concrete_function(
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 1239, in get_concrete_function
    concrete = self._get_concrete_function_garbage_collected(*args, **kwargs)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 1219, in _get_concrete_function_garbage_collected
    self._initialize(args, kwargs, add_initializers_to=initializers)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 785, in _initialize
    self._stateful_fn._get_concrete_function_internal_garbage_collected(  # pylint: disable=protected-access
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2480, in _get_concrete_function_internal_garbage_collected
    graph_function, _ = self._maybe_define_function(args, kwargs)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2711, in _maybe_define_function
    graph_function = self._create_graph_function(args, kwargs)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 2627, in _create_graph_function
    func_graph_module.func_graph_from_py_func(
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1141, in func_graph_from_py_func
    func_outputs = python_func(*func_args, **func_kwargs)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/eager/def_function.py", line 677, in wrapped_fn
    out = weak_wrapped_fn().__wrapped__(*args, **kwds)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/eager/function.py", line 3251, in bound_method_wrapper
    return wrapped_fn(*args, **kwargs)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1127, in autograph_handler
    raise e.ag_error_metadata.to_exception(e)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/framework/func_graph.py", line 1116, in autograph_handler
    return autograph.converted_call(
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/__autograph_generated_filex09cnksg.py", line 30, in tf____call__
    ag__.for_stmt(ag__.ld(self).graph_def.node, None, loop_body, get_state, set_state, (), {'iterate_names': 'node'})
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 449, in for_stmt
    _py_for_stmt(iter_, extra_test, body, None, None)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 498, in _py_for_stmt
    body(target)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 464, in protected_body
    original_body(protected_iter)
  File "/tmp/__autograph_generated_filex09cnksg.py", line 23, in loop_body
    output_ops = ag__.converted_call(ag__.ld(self).backend._onnx_node_to_tensorflow_op, (ag__.ld(onnx_node), ag__.ld(tensor_dict), ag__.ld(self).handlers), dict(opset=ag__.ld(self).opset, strict=ag__.ld(self).strict), fscope)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/__autograph_generated_file9e6y7t5f.py", line 62, in tf___onnx_node_to_tensorflow_op
    ag__.if_stmt(ag__.ld(handlers), if_body_1, else_body_1, get_state_1, set_state_1, ('do_return', 'retval_'), 2)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 1341, in if_stmt
    _py_if_stmt(cond, body, orelse)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 1394, in _py_if_stmt
    return body() if cond else orelse()
  File "/tmp/__autograph_generated_file9e6y7t5f.py", line 56, in if_body_1
    ag__.if_stmt(ag__.ld(handler), if_body, else_body, get_state, set_state, ('do_return', 'retval_'), 2)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 1341, in if_stmt
    _py_if_stmt(cond, body, orelse)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 1394, in _py_if_stmt
    return body() if cond else orelse()
  File "/tmp/__autograph_generated_file9e6y7t5f.py", line 48, in if_body
    retval_ = ag__.converted_call(ag__.ld(handler).handle, (ag__.ld(node),), dict(tensor_dict=ag__.ld(tensor_dict), strict=ag__.ld(strict)), fscope)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/__autograph_generated_filegrdvje8p.py", line 41, in tf__handle
    ag__.if_stmt(ag__.ld(ver_handle), if_body, else_body, get_state, set_state, ('do_return', 'retval_'), 2)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 1341, in if_stmt
    _py_if_stmt(cond, body, orelse)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/control_flow.py", line 1394, in _py_if_stmt
    return body() if cond else orelse()
  File "/tmp/__autograph_generated_filegrdvje8p.py", line 33, in if_body
    retval_ = ag__.converted_call(ag__.ld(ver_handle), (ag__.ld(node),), dict(**ag__.ld(kwargs)), fscope)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/__autograph_generated_filek2_ypwux.py", line 12, in tf__version
    retval_ = ag__.converted_call(ag__.ld(cls)._common, (ag__.ld(node),), dict(**ag__.ld(kwargs)), fscope)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 439, in converted_call
    result = converted_f(*effective_args, **kwargs)
  File "/tmp/__autograph_generated_fileumembp0q.py", line 15, in tf___common
    axis = ag__.if_exp(ag__.ld(axis) >= 0, lambda : ag__.ld(axis), lambda : ag__.converted_call(ag__.ld(len), (ag__.converted_call(ag__.ld(x).get_shape, (), None, fscope),), None, fscope) + ag__.ld(axis), 'axis >= 0')
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/conditional_expressions.py", line 27, in if_exp
    return _py_if_exp(cond, if_true, if_false)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/conditional_expressions.py", line 52, in _py_if_exp
    return if_true() if cond else if_false()
  File "/tmp/__autograph_generated_fileumembp0q.py", line 15, in <lambda>
    axis = ag__.if_exp(ag__.ld(axis) >= 0, lambda : ag__.ld(axis), lambda : ag__.converted_call(ag__.ld(len), (ag__.converted_call(ag__.ld(x).get_shape, (), None, fscope),), None, fscope) + ag__.ld(axis), 'axis >= 0')
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/impl/api.py", line 371, in converted_call
    return py_builtins.overload_of(f)(*args)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/py_builtins.py", line 242, in len_
    return _py_len(s)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/autograph/operators/py_builtins.py", line 307, in _py_len
    return len(s)
  File "/home1/irteam/.pyenv/versions/tf2.9.0_cuda11.2_cudnn8.1.1/lib/python3.9/site-packages/tensorflow/python/framework/tensor_shape.py", line 868, in __len__
    raise ValueError("Cannot take the length of shape with unknown rank.")
ValueError: in user code:

    File "/home1/irteam/users/yeonghwa.jin/tf-raft/onnx-tensorflow/onnx_tf/backend_tf_module.py", line 99, in __call__  *
        output_ops = self.backend._onnx_node_to_tensorflow_op(onnx_node,
    File "/home1/irteam/users/yeonghwa.jin/tf-raft/onnx-tensorflow/onnx_tf/backend.py", line 347, in _onnx_node_to_tensorflow_op  *
        return handler.handle(node, tensor_dict=tensor_dict, strict=strict)
    File "/home1/irteam/users/yeonghwa.jin/tf-raft/onnx-tensorflow/onnx_tf/handlers/handler.py", line 59, in handle  *
        return ver_handle(node, **kwargs)
    File "/home1/irteam/users/yeonghwa.jin/tf-raft/onnx-tensorflow/onnx_tf/handlers/backend/split.py", line 66, in version_13  *
        return cls._common(node, **kwargs)
    File "/home1/irteam/users/yeonghwa.jin/tf-raft/onnx-tensorflow/onnx_tf/handlers/backend/split.py", line 34, in _common  *
        axis = axis if axis >= 0 else len(x.get_shape()) + axis

    ValueError: Cannot take the length of shape with unknown rank.

To Reproduce

For Grid sampler functions, I added custom function and exported RAFT model to onnx:

# export pytorch to onnx

import sys
import os
import argparse
from collections import OrderedDict

import torch
from torch.onnx import register_custom_op_symbolic
import torch.onnx.symbolic_helper as sym_help

sys.path.append('core')
from jit_raft import RAFT
from utils.utils import InputPadder

def grid_sampler(g, input, grid, mode, padding_mode, align_corners):
    mode = sym_help._maybe_get_const(mode, "i")
    padding_mode = sym_help._maybe_get_const(padding_mode, "i")
    mode_str = ['bilinear', 'nearest', 'bicubic'][mode]
    padding_mode_str = ['zeros', 'border', 'reflection'][padding_mode]
    align_corners = int(sym_help._maybe_get_const(align_corners, "b"))

    return g.op("com.microsoft::GridSample", input, grid,
                mode_s=mode_str,
                padding_mode_s=padding_mode_str,
                align_corners_i=align_corners)

def modify_state_dict(state_dict):
    if list(state_dict.keys())[0].startswith("module"):
        start_idx = 1
    else:
        start_idx = 0
    new_state_dict = OrderedDict()
    for k, v in state_dict.items():
        name = ".".join(k.split(".")[start_idx:])
        new_state_dict[name] = v
    return new_state_dict

def export_model(args):
    save_dir = "./models"
    model_name, ext = os.path.splitext(os.path.basename(args.model))

    device = torch.device("cuda") if args.use_gpu else torch.device("cpu")

    image1 = torch.randn((1, 3, int(args.h), int(args.w)), device=device)
    image2 = torch.randn((1, 3, int(args.h), int(args.w)), device=device)
    padder = InputPadder(image1.shape)
    image1, image2 = padder.pad(image1, image2)

    model = RAFT(args)
    state_dict = modify_state_dict(torch.load(args.model, map_location=device))
    model.load_state_dict(state_dict)
    model.to(device)
    model.eval()

    if args.use_onnx:
        register_custom_op_symbolic('::grid_sampler', grid_sampler, 1)
        input_names = ["input_0", "input_1"]
        output_names = ["output_0"]
        if args.use_gpu:
            torch.onnx.export(model, (image1, image2), f=os.path.join(save_dir, f"{model_name}-gpu.onnx"), input_names=input_names, output_names=output_names, opset_version=15)
        else:
            torch.onnx.export(model, (image1, image2), f=os.path.join(save_dir, f"{model_name}-cpu.onnx"), input_names=input_names, output_names=output_names, opset_version=15)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', help="restore checkpoint")
    parser.add_argument('--w', help="dataset for evaluation")
    parser.add_argument('--h', help="dataset for evaluation")
    parser.add_argument('--small', action='store_true', help='use small model')
    parser.add_argument('--use_gpu', action='store_true', help='use gpu device for model')
    parser.add_argument('--use_onnx', action='store_true', help='use onnx format for exporting format')

    args = parser.parse_args(["--model", "models/raft-small.pth", "--w", "640", "--h", "480", "--small", "--use_onnx", "--use_gpu"])
    export_model(args)
# convert onnx to tensorflow
import os

import onnx
from onnx_tf.backend import prepare

def convert_onnx_to_tf(onnx_model_path):
    model_name, _ = os.path.split(onnx_model_path)
    tf_model_path = f"{model_name}.pb"

    onnx_model = onnx.load(onnx_model_path)
    tf_rep = prepare(onnx_model)
    tf_rep.export_graph(os.path.join("models", tf_model_path))

if __name__ == "__main__":
    convert_onnx_to_tf("../RAFT/models/raft-small-gpu.onnx")

ONNX model file ONNX model file

Python, ONNX, ONNX-TF, Tensorflow version

sdkdzq1 commented 2 years ago

I got an error message like this, it seems to dynamic input caused.

Rechargeablezz commented 2 years ago

I got an error message like this, it seems to dynamic input caused.

Have you solved this problem?

magicshuang commented 1 year ago

I also encountered a similar problem when using RAFT (did not support the operator problem)

That's how I solved it: https://github.com/onnx/onnx-tensorflow/issues/1031#issuecomment-1695125314

Rechargeablezz commented 1 year ago

Thanks!


发件人: magic_shuang @.> 发送时间: 2023年8月28日 15:09 收件人: onnx/onnx-tensorflow @.> 抄送: Shen Haoyang @.>; Comment @.> 主题: Re: [onnx/onnx-tensorflow] ValueError: Cannot take the length of shape with unknown rank. (Issue #1034)

I also encountered a similar problem when using RAFT (did not support the operator problem)

That's how I solved it:

1031 (comment)https://github.com/onnx/onnx-tensorflow/issues/1031#issuecomment-1695125314

― Reply to this email directly, view it on GitHubhttps://github.com/onnx/onnx-tensorflow/issues/1034#issuecomment-1695150795, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AYNUZJVA7LXDCIU7T46ES3TXXQ72PANCNFSM5W5UFKLA. You are receiving this because you commented.Message ID: @.***>