Open cooper-a opened 2 weeks ago
Memcpy nodes is to copy between devices (like GPU and CPU). So it's using both GPU and CPU.
Could you share some information to reproduce (like transformers/optimum/pytorch versions, and python script or command line for onnx export and optimize)?
Describe the issue
We are seeing an issue with a Transformer model which was exported using torch.onnx.export and then optimized with optimum ORTOptimizer. Inferencing seems to not be using GPU and only CPU.
Model was exported on CPU machine using ONNX 1.16.0. We see the following logs when starting the inference session.