onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : CopyTensorAsync is not implemented

microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

https://onnxruntime.ai

MIT License

14.58k stars 2.92k forks source link

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : CopyTensorAsync is not implemented #21690

Open DefTruth opened 2 months ago

DefTruth commented 2 months ago

Describe the issue

onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : CopyTensorAsync is not implemented

To reproduce

build from source, cuda 12.3

Urgency

No response

Platform

Linux

OS Version

onnxruntime-gpu-1.20.0

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime-gpu-1.20.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

DefTruth commented 2 months ago

2024-08-09 18:31:40.646326213 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 680 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message

mszhanyi commented 2 months ago

Did you use transformers/optimum? Could you try

 transformers>=4.24.0,<= 4.42.4
 optimum<=1.21.2

DefTruth commented 2 months ago

I use transformers==4.42.4 and optimum==1.21.2

github-actions[bot] commented 1 month ago

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.