openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
6.49k stars 2.11k forks source link

Does OpenVINO support INT8 Matmul? #24812

Open Septend-fun opened 1 month ago

Septend-fun commented 1 month ago

OpenVINO Version

2024.1.0

Operating System

Windows System

Device used for inference

NPU

Framework

None

Model used

Matmul

Issue description

I'd like to run MatMul op using(sync_benchmark [https://github.com/openvinotoolkit/openvino/tree/releases/2024/1/samples/cpp/benchmark/sync_benchmark]) with XML file. I used cmd sync_benchmark.exe matmul.xml CPU , and I got normal results. But when I ran on NPU with sync_benchmark.exe matmul.xml NPU, then I got error.

屏幕截图 2024-06-03 120021

Step-by-step reproduction

  1. Build and get sync_benchmar.exe [https://github.com/openvinotoolkit/openvino/tree/releases/2024/1/samples/cpp/benchmark/sync_benchmark]
  2. Run cmd sync_benchmark.exe matmul.xml NPU matmul.xml :
    <?xml version="1.0"?>
    <net name="main_graph" version="11">
    <layers>
        <layer id="1" name="input1" type="Parameter" version="opset1">
            <data shape="1,1,1024" element_type="i8" />
            <output>
                <port id="0" precision="I8" names="input1">
                    <dim>1</dim>
                    <dim>1</dim>
                    <dim>1024</dim>
                </port>
            </output>
        </layer>
        <layer id="0" name="input2" type="Parameter" version="opset1">
            <data shape="1,1024,1024" element_type="i8" />
            <output>
                <port id="0" precision="I8" names="input2">
                    <dim>1</dim>
                    <dim>1024</dim>
                    <dim>1024</dim>
                </port>
            </output>
        </layer>
        <layer id="2" name="output" type="MatMul" version="opset1">
            <data transpose_a="false" transpose_b="true" />
            <input>
                <port id="0" precision="I8">
                    <dim>1</dim>
                    <dim>1</dim>
                    <dim>1024</dim>
                </port>
                <port id="1" precision="I8">
                    <dim>1</dim>
                    <dim>1024</dim>
                    <dim>1024</dim>
                </port>
            </input>
            <output>
                <port id="2" precision="I32" names="output">
                    <dim>1</dim>
                    <dim>1</dim>
                    <dim>1024</dim>
                </port>
            </output>
        </layer>
        <layer id="3" name="output/sink_port_0" type="Result" version="opset1">
            <input>
                <port id="0" precision="I32">
                    <dim>1</dim>
                    <dim>1</dim>
                    <dim>1024</dim>
                </port>
            </input>
        </layer>
    </layers>
    <edges>
        <edge from-layer="0" from-port="0" to-layer="2" to-port="1" />
        <edge from-layer="1" from-port="0" to-layer="2" to-port="0" />
        <edge from-layer="2" from-port="2" to-layer="3" to-port="0" />
    </edges>
    <rt_info>
        <MO_version value="2024.1.0-15008-f4afc983258-releases/2024/1" />
        <Runtime_version value="2024.1.0-15008-f4afc983258-releases/2024/1" />
        <conversion_parameters>
            <input_model value="DIR\matmul.onnx" />
            <is_python_api_used value="False" />
        </conversion_parameters>
        <legacy_frontend value="False" />
    </rt_info>
    </net>

Relevant log output

No response

Issue submission checklist

avitial commented 1 week ago

Ref. 146007