Open darintay opened 1 year ago
Some of that memory usage can be attributed to torch itself, but not all of it.
If I updated to print memory usage just before starting the onnx session:
# Without torch
Before: 50.22M
After: Using 932.09M
# With torch
Before: 1025.58M
After: Using 5109.57M
@darintay, Try append the following code to inspect memory:
from psutil._compat import get_terminal_size
from psutil._common import bytes2human
def safe_print(s):
s = s[:get_terminal_size()[0]]
try:
print(s)
except UnicodeEncodeError:
print(s.encode('ascii', 'ignore').decode())
p = psutil.Process( os.getpid() )
templ = "%-20s %10s %-7s %s"
print(templ % ("Address", "RSS", "Mode", "Mapping"))
total_rss = 0
for m in p.memory_maps(grouped=False):
total_rss += m.rss
safe_print(templ % (
m.addr.split('-')[0].zfill(16),
bytes2human(m.rss),
m.perms,
m.path))
print("-" * 31)
print(templ % ("Total", bytes2human(total_rss), '', ''))
In my machine (ORT 1.13.1 with CUDA 11.7, torch 1.12.1+cu116). It is like the following: Without torch: CPU memory usage: before=66.6 MB, peak=1089.3 MB GPU memory usage: before=396.6MB, peak=778.3 MB
With torch: CPU memory usage: before=231.6 MB, peak=2371.0 MB GPU memory usage: before=396.6 MB, peak=1208.3 MB
Using the above method, many memory blocks after libcudnn_ops_infer.so.8:
Maybe it is related to cuDNN workspace allocation. However, change settings of CuDNN in ORT and Torch seems not help.
Thanks for looking into this!
I've used your code with my example to generate 3 files.
Unfortunately it looks like most of the memory is just in 'heap', so I'm not sure how helpful this will be.
mem_notorch_after_onnx_session.txt mem_torch_after_onnx_session.txt mem_torch_before_onnx_session.txt
It does definitely seem to be environment-related.
Using nvcr.io/nvidia/pytorch:22.10-py3 and installing onnxruntime-gpu in there, I get much more reasonable memory numbers running this script. I should have tried that earlier!
I'll see if I can narrow it down and/or work around it with different package versions.
Seems to depend entirely on the pytorch version
torch-1.9.0+cu111 (installed via "pip install torch==1.9.0+cu111 --find-links https://download.pytorch.org/whl/torch_stable.html")
Before ONNX: 1027.98M
After ONNX: 5124.55M
torch-1.10.1+cu113
Before ONNX: 330.80M
After ONNX: 4711.17M
torch-1.11.0+cu115
Before ONNX: 253.25M
After ONNX: 3090.81M
torch-1.13.0+cu116
Before ONNX: 251.27M
After ONNX: 2345.90M
The pytorch 22.10-py3 NGC image reproduces the 2000M memory usage vs 1000M without torch imported, which still seems high if there's any interest in investigating.
It still seems strange to me that having torch imported or not has this huge impact on the memory usage of my ONNX session, but at least I can get to more reasonable numbers with a pytorch upgrade.
@darintay, in In mem_torch_before_onnx_session.txt(https://github.com/microsoft/onnxruntime/files/10026594/mem_torch_before_onnx_session.txt) you attached:
00007efecea60000 632.9M r-xp /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cuda_cpp.so
00007eff4d5ef000 196.3M rw-p /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cuda_cpp.so
In mem_torch_after_onnx_session.txt:
00007efecea60000 913.0M r-xp /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cuda_cpp.so
00007eff4d5ef000 196.3M rw-p /usr/local/lib/python3.7/dist-packages/torch/lib/libtorch_cuda_cpp.so
That's about 1G in total.
Describe the issue
My ONNX session was using way more system memory than expected, narrowed it down to only occurring when torch is imported.
Loading the same model with and without torch imported goes from 5G to 1G of system memory.
To reproduce
Run this script with and without the torch import line. Using https://github.com/onnx/models/blob/main/vision/classification/mnist/model/mnist-8.onnx, though it seems to be the same for all models I've tried.
Prints 942.93M if
import torch
is commented out. Prints 5114.8M ifimport torch
is left in.(1G still seems like a lot for a dinky mnist model, but much better than 5G!)
Urgency
No response
Platform
Linux
OS Version
Ubuntu 20.04
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
1.13.1
ONNX Runtime API
Python
Architecture
X64
Execution Provider
CUDA
Execution Provider Library Version
CUDA 11.4 and CUDA 11.6
Model File
No response
Is this a quantized model?
No