[ErrorCode:Fail] Load model from [...]\latin_ipa_forward.onnx failed:invalid vector subscript

ADD-eNavarro commented 1 year ago

Describe the issue

I am trying to use DeepPhonemizer (in Python) from C#. To achieve that, I've converted the PyTorch model file (latin_ipa_forward.pt) to onnx, with two custom opset operations: aten::unflatten and aten::scaled_dot_product_attention.

Here's the resulting conversion code, extension changed from .py to .txt: ToOnnx.txt

import onnxscript
import torch

# Assuming you use opset18
from onnxscript.onnx_opset import opset18 as op

custom_opset = onnxscript.values.Opset(domain="torch.onnx", version=18)

# Registering custom operation for scaled dot product attention
@onnxscript.script(custom_opset)
def ScaledDotProductAttention(
    query,
    key,
    value,
    dropout_p,
):
    # Swap the last two axes of key
    key_shape = op.Shape(key)
    key_last_dim = key_shape[-1:]
    key_second_last_dim = key_shape[-2:-1]
    key_first_dims = key_shape[:-2]
    # Contract the dimensions that are not the last two so we can transpose
    # with a static permutation.
    key_squeezed_shape = op.Concat(
        op.Constant(value_ints=[-1]), key_second_last_dim, key_last_dim, axis=0
    )
    key_squeezed = op.Reshape(key, key_squeezed_shape)
    key_squeezed_transposed = op.Transpose(key_squeezed, perm=[0, 2, 1])
    key_transposed_shape = op.Concat(key_first_dims, key_last_dim, key_second_last_dim, axis=0)
    key_transposed = op.Reshape(key_squeezed_transposed, key_transposed_shape)

    embedding_size = op.CastLike(op.Shape(query)[-1], query)
    scale = op.Div(1.0, op.Sqrt(embedding_size))

    # Scale q, k before matmul for stability see https://tinyurl.com/sudb9s96 for math
    query_scaled = op.Mul(query, op.Sqrt(scale))
    key_transposed_scaled = op.Mul(key_transposed, op.Sqrt(scale))
    attn_weight = op.Softmax(
        op.MatMul(query_scaled, key_transposed_scaled),
        axis=-1,
    )
    attn_weight, _ = op.Dropout(attn_weight, dropout_p)
    return op.MatMul(attn_weight, value)

def custom_scaled_dot_product_attention(g, query, key, value, attn_mask, dropout, is_causal, scale=None):
    return g.onnxscript_op(ScaledDotProductAttention, query, key, value, dropout).setType(query.type())

torch.onnx.register_custom_op_symbolic(
    symbolic_name="aten::scaled_dot_product_attention",
    symbolic_fn=custom_scaled_dot_product_attention,
    opset_version=18,
)

# Registering custom operation for unflatten
@onnxscript.script(custom_opset)
def aten_unflatten(self, dim, sizes):
    """unflatten(Tensor(a) self, int dim, SymInt[] sizes) -> Tensor(a)"""

    self_size = op.Shape(self)

    if dim < 0:
        # PyTorch accepts negative dim as reversed counting
        self_rank = op.Size(self_size)
        dim = self_rank + dim

    head_start_idx = op.Constant(value_ints=[0])
    head_end_idx = op.Reshape(dim, op.Constant(value_ints=[1]))
    head_part_rank = op.Slice(self_size, head_start_idx, head_end_idx)

    tail_start_idx = op.Reshape(dim + 1, op.Constant(value_ints=[1]))
    #tail_end_idx = op.Constant(value_ints=[_INT64_MAX])
    tail_end_idx = op.Constant(value_ints=[9223372036854775807]) # = sys.maxint, exactly 2^63 - 1 -> 64 bit int
    tail_part_rank = op.Slice(self_size, tail_start_idx, tail_end_idx)

    final_shape = op.Concat(head_part_rank, sizes, tail_part_rank, axis=0)

    return op.Reshape(self, final_shape)

def custom_unflatten(g, self, dim, shape):
    return g.onnxscript_op(aten_unflatten, self, dim, shape).setType(self.type().with_sizes([32, 32, 1536]))   

torch.onnx.register_custom_op_symbolic(
    symbolic_name="aten::unflatten",
    symbolic_fn=custom_unflatten,
    opset_version=18,
)

########## Custom ops ready, time to convert the model to onnx

from dp.model.model import load_checkpoint

model, checkpoint = load_checkpoint('latin_ipa_forward.pt')

dummy_input = {"batch": {"text":torch.rand((32,32)).long()}}

torch.onnx.export(
    model,                      # PyTorch Model
    dummy_input,                # Input tensor
    "latin_ipa_forward.onnx",   # Output file (eg. 'output_model.onnx')
    custom_opsets = {"torch.onnx": 18},           
    opset_version=18,           # Operator support version
    input_names=['embedding'],  # Input tensor name (arbitary) -> (embedding): Embedding(84, 512)
    output_names=['fc_out']     # Output tensor name (arbitary) ->  (fc_out): Linear(in_features=512, out_features=82, bias=True)
)

The resulting model goes from 70MB to 50MB, still I won't attach it unless required.

When I try to create an inference session I get this error message:

Exception thrown: 'Microsoft.ML.OnnxRuntime.OnnxRuntimeException' in Microsoft.ML.OnnxRuntime.dll
An unhandled exception of type 'Microsoft.ML.OnnxRuntime.OnnxRuntimeException' occurred in Microsoft.ML.OnnxRuntime.dll
[ErrorCode:Fail] Load model from [path\to]\latin_ipa_forward.onnx failed:invalid vector subscript

To reproduce

Try to create an inference session with the model:

using InferenceSession session = new InferenceSession(@"path\to\latin_ipa_forward.onnx");

Urgency

No response

Platform

Windows

OS Version

10 Pro

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.14.1 NuGet Package

ONNX Runtime API

C#

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

wschin commented 1 year ago

What is dp in from dp.model.model import load_checkpoint? I can't run the code because that missing package.

[Update] I got it; dp is [DeepPhonemizer](https://github.com/as-ideas/DeepPhonemizer). On the other hand, the test script is still not runnable because of [Errno 2] No such file or directory: 'latin_ipa_forward.pt'. Please provide a complete repro, a script I can just run and debug. Thanks.

ADD-eNavarro commented 1 year ago

Hi! Yes, dp is the DeepPhonemizer library. Here's the missing model file: https://public-asai-dl-models.s3.eu-central-1.amazonaws.com/DeepPhonemizer/latin_ipa_forward.pt

wschin commented 1 year ago

Hit another error:

  File "<@beartype(torch.onnx.utils.export) at 0x7f191adcbe50>", line 63, in export
beartype.roar.BeartypeCallHintParamViolation: @beartyped torch.onnx.utils.export() parameter args={'text': tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0... violates type hint typing.Union[typing.Tuple[typing.Any, ...], torch.Tensor], as dict {'text': tensor([[0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0,  ..., 0, 0, 0],
        [0, 0, 0... not <protocol "torch.Tensor"> or tuple.

In latest PyTorch master branch, all inputs must be represented as a tuple of tensors. I am trying to workaround it but if you have better idea on modifying the model, please let me know.

wschin commented 1 year ago

I change the input from dictionary to tensor and the corresponding forward function. Now, I reach a similar error

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Load model from latin_ipa_forward.onnx failed:vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)

I saw the error coming from

#4  0x00007fff486 in onnxruntime::function_utils::CreateSchema(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, onnxruntime::InlinedHashMap<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, onnx::FunctionProto const*, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, onnx::FunctionProto const*> > > const&, std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, int> > > const&, onnxruntime::SchemaRegistryManager const&, onnxruntime::logging::Logger const&, bool) ()

This means your exporter could be wrong because ORT doesn't receive correct schema.

justinchuby commented 1 year ago

Could this be related? https://github.com/microsoft/onnxruntime/issues/15404

wschin commented 1 year ago

@justinchuby, Probably but I use today's master branch. Overall the problem is that ORT doesn't generate meaning error message so that user and dev must dive into the very deep place to understand the situation. Basically, the entire function_utils.cc can a similar problem. I will try to improve the calls around at(...). In the meanwhile, we learned that at(...) (and all built-in C++ error checking mechanism) doesn't generate actionable error message --- because the meaning of an exception depends on its context and C++ classes don't have that context.

justinchuby commented 1 year ago

I totally agree. Also if it's today's main branch it may be a different issue then.

wschin commented 1 year ago

Looks like there is a cache problem. After clean build, I don't hit this error anymore. Thanks @justinchuby.

@ADD-eNavarro, could you please use the latest main branch? As @justinchuby mentioned, there was a bug fixed last week.

justinchuby commented 1 year ago

@ADD-eNavarro you may consider the nightly builds if it makes it easier https://onnxruntime.ai/docs/install/#inference-install-table-for-all-languages

ADD-eNavarro commented 1 year ago

I have installed the nightly dev version of ORT 1.15.0-dev-20230319-0802-e42f7487df, and this time the error message is just slightly different:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException: '[ErrorCode:Fail] Load model from [path\to]\latin_ipa_forward.onnx failed:invalid vector subscript'

Follow the details:

Microsoft.ML.OnnxRuntime.OnnxRuntimeException
  HResult=0x80131500
  Message=[ErrorCode:Fail] Load model from [path\to]\latin_ipa_forward.onnx failed:invalid vector subscript
  Source=Microsoft.ML.OnnxRuntime
  StackTrace:
   at Microsoft.ML.OnnxRuntime.NativeApiStatus.VerifySuccess(IntPtr nativeStatus)
   at Microsoft.ML.OnnxRuntime.InferenceSession.Init(String modelPath, SessionOptions options, PrePackedWeightsContainer prepackedWeightsContainer)
   at Microsoft.ML.OnnxRuntime.InferenceSession..ctor(String modelPath)
   at Program.<Main>$(String[] args) in [path\to]\Program.cs:line 44

Line 44 being the new InferenceSession one.

Finally, the call stack:

justinchuby commented 1 year ago

You probably need a version that is newer than Apr 12 (aka. >dev-20230412) for it to include the fix

justinchuby commented 1 year ago

Let me check with the team for the build

ADD-eNavarro commented 1 year ago

@justinchuby Any news on this?

BTW, version dev-20230319 is the latest available as a NuGet package. How do I install >dev-20230412?

justinchuby commented 1 year ago

I didn’t get any updates. An alternative route is to compile from source. You should be able to find instructions here: https://onnxruntime.ai/docs/build/inferencing.html#build-nuget-packages

@xadupre do you have more info on the build pipeline?

ADD-eNavarro commented 1 year ago

Unfortunately, compiling from source is not a viable option: installing the tools mentioned in your link in my work station would likely take me longer than waiting for you guys to find a solution. IT people are really overwhelmed, bosses take some time to grant permission... you know the drill. So I'll wait.

microsoft / onnxruntime