Unable to load onnx quantized model (fails during model checking)

dhruvbird commented 9 months ago

Describe the issue

I have an onnx model that I can run. However, after dynamic quantization, the model fails the checker. What should I do?

This model is a standard LSTM encoder + LSTM decoder with attention. There's a for loop in the model's forward() method that might be causing this issue. I'm copying the code for forward below.

    def forward(
        self,
        x: torch.Tensor,
        steps: torch.Tensor,
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        B, L, _ = x.shape
        assert B == 1, f"Batch size: {B}. Expected 1"
        print("Hello")
        lengths = torch.tensor([[L]], dtype=torch.long, device=x.device)

        encoder_output, _ = self.encoder(x, lengths)
        decoder_input = torch.full(
            # B
            (1, 1),
            self.tok_input_bos,
            dtype=torch.long,
            device=x.device,
        )
        hc = (
            # The 2nd dimension is 1 because B == 1.
            #                              B
            self.decoder_hidden.expand(-1, 1, -1).contiguous(),
            #                            B
            self.decoder_cell.expand(-1, 1, -1).contiguous(),
        )
        tokens_predicted = []
        logits_predicted = []

        while steps.item() > 0:
            # for _i in range(int(steps..item())):
            y, hc = self.decoder(decoder_input, encoder_output, lengths, hc)
            #                  B
            assert y.shape == (1, 1, 32)
            logits_predicted.append(y)
            y_argmax = y.argmax(dim=-1)
            tokens_predicted.append(y_argmax)
            decoder_input = y_argmax

            steps.sub_(1)
        # end for

        tp = torch.cat(tokens_predicted, dim=-1)
        lp = torch.cat(logits_predicted, dim=1)
        return tp, lp

Here's the error I received:

{
    "name": "ValidationError",
    "message": "Nodes in a graph must be topologically sorted, however input '/Add_output_0_quantized' of node: 
name: /MatMul_2_quant OpType: MatMulInteger
 is not output of any previous nodes.

==> Context: Bad node spec for node. Name: /Loop OpType: Loop",
    "stack": "---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)

def get_onnx_session(model_path):
          onnx_model = onnx.load(model_path)
---->     onnx.checker.check_model(onnx_model)
          ort_session = ort.InferenceSession(model_path)
          return ort_session

onnx/checker.py:148, in check_model(model, full_check, skip_opset_compatibility_check)
    144 if sys.getsizeof(protobuf_string) > MAXIMUM_PROTOBUF:
    145     raise ValueError(
    146         \"This protobuf of onnx model is too large (>2GB). Call check_model with model path instead.\"
    147     )
--> 148 C.check_model(protobuf_string, full_check, skip_opset_compatibility_check)

ValidationError: Nodes in a graph must be topologically sorted, however input '/Add_output_0_quantized' of node: 
name: /MatMul_2_quant OpType: MatMulInteger
 is not output of any previous nodes.

==> Context: Bad node spec for node. Name: /Loop OpType: Loop"
}

I've attached a .zip archive of the quantized model file.

inference_q.onnx.zip

Additionally, I received these messages when I quantized the model:

...
2023-12-07 11:28:19.693258 [W:onnxruntime:, graph.cc:3553 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Concat_10_output_0'. It is not used by any node and should be removed from the model.
2023-12-07 11:28:19.693261 [W:onnxruntime:, graph.cc:3553 CleanUnusedInitializersAndNodeArgs] Removing initializer '/Constant_119_output_0'. It is not used by any node and should be removed from the model.
[2023-12-07 11:28:20,175] root        : INFO     Quantization parameters for tensor:"/Transpose_5_output_0" not specified
[2023-12-07 11:28:20,190] root        : INFO     Quantization parameters for tensor:"/Add_output_0" not specified
Ignore MatMul due to non constant B: [//Loop](https://loop/):body/[/MatMul_4]
Ignore MatMul due to non constant B: [//Loop](https://loop/):body/[/MatMul_5]
[2023-12-07 11:28:20,457] root        : INFO     Quantization parameters for tensor:"/Transpose_8_output_0" not specified
Original model size: 30091122, Pre-processed model size: 30087581, Quantized model size: 7611790

The code to quantize the model:

def quantize_onnx_model():
    quant_pre_process(
        INFERENCE_MODEL_PATH, "/tmp/inference_pp.onnx", skip_symbolic_shape=False
    )
    quantize_dynamic(
        "/tmp/inference_pp.onnx",
        "/tmp/inference_q.onnx",
        weight_type=QuantType.QUInt8,
        extra_options={"EnableSubgraph": True},
    )
    file_size_orig = os.stat(INFERENCE_MODEL_PATH).st_size
    file_size_pp = os.stat("/tmp/inference_pp.onnx").st_size
    file_size_quant = os.stat("/tmp/inference_q.onnx").st_size

    print(
        f"Original model size: {file_size_orig}, Pre-processed model size: {file_size_pp}, Quantized model size: {file_size_quant}"
    )

To reproduce

Run this code:

def get_onnx_session(model_path):
    onnx_model = onnx.load(model_path)
    onnx.checker.check_model(onnx_model)
    ort_session = ort.InferenceSession(model_path)
    return ort_session

get_onnx_session("inference_q.onnx")

Urgency

Everything is always urgent - I'm just trying to run a quantized model that I want to use in a product feature.

Platform

Mac

OS Version

n/a

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

version = "1.15.0" git_version = "b86cc54efce19530fb953e4b21f57e6b3888534c"

ONNX Runtime API

Python

Architecture

ARM64

Execution Provider

Default CPU

Execution Provider Library Version

No response

yufenglee commented 9 months ago

@yihonglyu, please help take a look.

dhruvbird commented 9 months ago

@zhijxu-MS Wanted to check if you had a chance to look into this! Thanks!

dhruvbird commented 5 months ago

@zhijxu-MS gentle ping.

microsoft / onnxruntime