Open DefTruth opened 2 months ago
2024-08-09 18:31:40.646326213 [W:onnxruntime:, transformer_memcpy.cc:74 ApplyImpl] 680 Memcpy nodes are added to the graph main_graph for CUDAExecutionProvider. It might have negative impact on performance (including unable to run CUDA graph). Set session_options.log_severity_level=1 to see the detail logs before this message
Did you use transformers/optimum? Could you try
transformers>=4.24.0,<= 4.42.4
optimum<=1.21.2
I use transformers==4.42.4 and optimum==1.21.2
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : CopyTensorAsync is not implemented
To reproduce
build from source, cuda 12.3
Urgency
No response
Platform
Linux
OS Version
onnxruntime-gpu-1.20.0
ONNX Runtime Installation
Built from Source
ONNX Runtime Version or Commit ID
onnxruntime-gpu-1.20.0
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
No response