microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.11k stars 2.84k forks source link

[CUDAExecutionProvider] Regression from ORT 1.15.0 onwards: Compute MatMul dimension mismatch #18692

Closed fxmarty closed 5 months ago

fxmarty commented 9 months ago

Describe the issue

Hi, I noticed a regression in onnxruntime-gpu==1.15.1 and onnxruntime-gpu==1.16.3 (no problem on onnxruntime-gpu==1.14.1.

The following code runs fine on CPUExecutionProvider for all three ORT versions, but fails on CUDAExecutionProvider for 1.15.1 and 1.16.3.

import onnxruntime
from transformers import DetrImageProcessor
import torch
from PIL import Image
import requests
from transformers.models.detr.modeling_detr import DetrObjectDetectionOutput

from optimum.onnxruntime import ORTModelForCustomTasks

url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(url, stream=True).raw)

model = onnxruntime.InferenceSession("/path/to/model.onnx", providers=["CUDAExecutionProvider"])

processor = DetrImageProcessor.from_pretrained("facebook/detr-resnet-50", revision="no_timm")

inputs = processor(images=image, return_tensors="np")
inputs = {
    "pixel_values": inputs["pixel_values"]
}

outputs = model.run(None, inputs)

with the error:

Traceback (most recent call last):
  File "<tmp 1>", line 22, in <module>
    outputs = model.run(None, inputs)
  File "/home/fxmarty/anaconda3/envs/hf-inf/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 217, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running MatMul node. Name:'/model/decoder/layers.0/self_attn/out_proj/MatMul' Status Message: matmul_helper.h:59 Compute MatMul dimension mismatch

To reproduce

As above. Reproduce with https://huggingface.co/fxmarty/bugged-detr-ort-cuda/tree/main

Using CUDA 11.7, which should be compatible according to https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html

Urgency

medium

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

as above

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

CUDA 11.7

fxmarty commented 9 months ago

cc @tianleiwu @yufenglee

I guess one may compile with ORT_DEBUG_NODE_IO_DUMP_SHAPE_DATA=1 to see which node the issue comes from.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

fxmarty commented 5 months ago

Hi @yufenglee @tianleiwu, this issue is not stale and reported by user for an other architecture (table-transformer), with onnxruntime-gpu==1.17.1: https://github.com/huggingface/optimum/issues/1774

tianleiwu commented 5 months ago

The issue is resolved in the main branch.

I did reproduce it in 1.17.1:

So the issue is caused by some basic level graph optimization. If there is time, some debugging (by disabling basic level graph optimization one by one) can find which optimizer is the cause.

fxmarty commented 5 months ago

Thanks a lot @tianleiwu

thisisd3 commented 5 months ago

Hi all, is this resolved in 1.17.3 released 2 days ago? @tianleiwu