Intel OneDNN - Githubissues

Describe the issue

I have quantized sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 model using the below script.

class PytorchModel(nn.Module):
    def __init__(self, model_path: str):
        super().__init__()
        self.model = SentenceTransformer(model_name)

    def forward(
        self, input_ids: torch.Tensor, attention_mask: torch.Tensor
    ) -> torch.Tensor:
        features = {}
        features["input_ids"] = input_ids
        features["attention_mask"] = attention_mask
        # output = self.feed_forward(inputs)
        output = self.model(features)
        # return output
        return output["sentence_embedding"]        

py_model = PytorchModel(model_path)
py_model.eval()
tokenizer = transformers.AutoTokenizer.from_pretrained(tokenizer_path)
inputs = tokenizer.prepare_for_model(
    tokenizer.convert_tokens_to_ids(tokenizer.tokenize("i am fine"))
)
attention_mask = torch.tensor([inputs["attention_mask"]])
input_ids = torch.tensor([inputs["input_ids"]])
torch.onnx.export(
    py_model,
    (input_ids, attention_mask),
    "bert.onnx",
    opset_version=13,
    do_constant_folding=True,
    input_names=["input_ids", "attention_mask"],
    output_names=["output"],
    dynamic_axes={
        "input_ids": {0: "batch_size", 1: "sentence_length"},
        "attention_mask": {0: "batch_size", 1: "sentence_length"},
    },
)
onnx_model_path = "bert.onnx"
quantized_model_path = "bert_quantized.onnx"
quantize_dynamic(
        model_input=onnx_model_path,
        model_output=quantized_model_path,
)

I generated the build using the below command

./build.sh --config RelWithDebInfo --build_shared_lib --parallel --enable_training --skip_tests  --build_java --use_dnnl

When ever I infer the quantized model in java using One DNN executive provider, I am getting the below error.

2024-04-05 12:37:23.831441312 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running DNNL_9692988425953928956_1 node. Name:'DnnlExecutionProvider_DNNL_9692988425953928956_1_1' Status Message: /onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_dequantizelinear.cc:191 void onnxruntime::ort_dnnl::DnnlDequantizeLinear::ValidateDims(onnxruntime::ort_dnnl::DnnlSubgraphPrimitive&, onnxruntime::ort_dnnl::DnnlNode&) x_scale and x_zero_point dimensions does not match

Please note that when I remove options.addDnnl(true); from the session options, the same model and script work well. I tried running the ONNX model (not quantized), and it also works fine.

This issue occurs when I infer the model with different inputs. For example, if I send the input "test" during the first inference, I receive the corresponding vector. However, in the second model call, when I try inputs other than "test", it shows me an error.

To reproduce

Please find the models and Jar here

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04.3

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.0

ONNX Runtime API

Java

Architecture

X64

Execution Provider

oneDNN

Execution Provider Library Version