Create session has different gpu memory (EP: CUDA)

mifikri commented 3 years ago

Describe the bugs:

System information:

Linux Ubuntu 20.04 (GCC 9.3)
ONNX Runtime installed build from source
ONNX Runtime version: 1.8.2
GPU model and memory: GeForce GTX 1060 MaxQ

Discussed in https://github.com/microsoft/onnxruntime/discussions/9137

^{Originally posted by **mifikri** September 21, 2021} Hi, i'm trying to create session below using c++ and configuring with cmake. Model : [yolov4.onnx](https://github.com/onnx/models/tree/master/vision/object_detection_segmentation/yolov4/model) Dependencies: - Ort : 1.8.2 - Torch: 1.9.0 CMakeLists.txt ```cmake cmake_minimum_required(VERSION 3.14) project(main) add_executable(${PROJECT_NAME} main.cpp) target_include_directories(${PROJECT_NAME} PRIVATE /usr/local/include/onnxruntime ) # find_package(Torch REQUIRED) target_link_libraries( ${PROJECT_NAME} onnxruntime # ${TORCH_LIBRARIES} ) ``` main.cpp ```cpp #include #include #include #include void initialize_session(std::string model_path, bool buffered){ static Ort::Env env(ORT_LOGGING_LEVEL_INFO, ""); Ort::SessionOptions session_options; Ort::ThrowOnError(OrtSessionOptionsAppendExecutionProvider_CUDA(session_options, 0)); session_options.SetGraphOptimizationLevel(GraphOptimizationLevel::ORT_ENABLE_BASIC); session_options.SetExecutionMode(ExecutionMode::ORT_SEQUENTIAL); if (buffered) { auto session = Ort::Session(env, model_path.c_str(), model_path.size(), session_options); } else { auto session = Ort::Session(env, model_path.c_str(), session_options); } std::cout<<"model loaded\n"; } int main() { std::string model_path = "/workspaces/vortex-runtime/data/yolov4.onnx"; bool buffered = false; initialize_session(model_path, buffered); while(true) { sleep(1); std::cout<<"infer\n"; } } ``` Scenario: cmake configure with other linking libraries (torch) GPU Memory: | model | linking Ort 1.8.2 | linking Ort 1.8.2 + Torch 1.9.0 | |----------|-------------|--------------------------------| | yolov4.onnx| 207 MiB | 825 MiB| linking onnxruntime ![Screenshot from 2021-09-21 14-39-20](https://user-images.githubusercontent.com/31136309/134131475-aa98d756-c017-4c5b-a776-0db75079cac3.png) The profiling test indicate that there is no memory dealocation (cuda free) while create session (linking onnxruntime+torch) linking onnxruntime + torch ![Screenshot from 2021-09-21 14-39-27](https://user-images.githubusercontent.com/31136309/134131473-62eb29e9-2559-45b4-a464-d29717688e18.png) Have anyone ever met these kind of scenario / test ?

yufenglee commented 3 years ago

Could you say more about onnxruntime+torch? Do you load both onnxruntime and torch? And what is your build option?

mifikri commented 3 years ago

i just create session using onnxruntime, there is no invoke torch library.

the point is: if the build option linking to onnxruntime, the gpu memory allocate 207 MiB.

and if both onnxruntime and torch are linking, the gpu memory allocate 825MiB.

yufenglee commented 3 years ago

As you built from source, I want to know if you built with training enabled by asking for the build option. Our training has dependency on torch in certain cases.

mifikri commented 3 years ago

I was using the build script from https://github.com/microsoft/onnxruntime/blob/v1.8.2/dockerfiles/Dockerfile.tensorrt#L21 .

...
'-Donnxruntime_ENABLE_TRAINING=OFF', 
'-Donnxruntime_ENABLE_TRAINING_OPS=OFF'

Does it introduce effect for create session?

FYI, i just test onnxruntime v1.4.0 with torch 1.6. has same effect. The build command same too https://github.com/microsoft/onnxruntime/blob/v1.4.0/dockerfiles/Dockerfile.tensorrt#L21

model	linking ort 1.4.0	linking ort 1.4.0 and torch 1.6.0
yolov4	311 MiB	611 MiB

the pattern is likely same as ort 1.8.2 version.

linking ort 1.4.0

Screenshot from 2021-09-22 11-05-28

linking ort 1.4.0 + torch 1.6.0

Screenshot from 2021-09-22 11-05-23

triwahyuu commented 3 years ago

looks like similar problem occur in python, as described in #8823

hariharans29 commented 2 years ago

I will do some digging into this

hariharans29 commented 2 years ago

I couldn't repro this on Windows (K620) or Ubuntu 20 (V100). Any chance you could try with ORT 1.10 and check if the issue persists ?

triwahyuu commented 2 years ago

Hi @hariharans29, I could replicate this in python by importing or not importing torch in the same script:

import onnxruntime as ort

sess = ort.InferenceSession("resnet18-v2-7.onnx", providers=['CUDAExecutionProvider'])
assert 'CUDAExecutionProvider' in sess.get_providers()
while True:
    pass

gets 345MB of memory: Screenshot from 2021-12-19 12-01-26

while when adding import torch:

import onnxruntime as ort
import torch

sess = ort.InferenceSession("resnet18-v2-7.onnx", providers=['CUDAExecutionProvider'])
assert 'CUDAExecutionProvider' in sess.get_providers()
while True:
    pass

I get 1031MB:

I'm using:

onnxruntime 1.8.1
pytorch 1.9.0
CUDA 11.3

I got around the same memory usage when using onnxruntime 1.10. I'm using resnet model from onnx repo https://github.com/onnx/models/blob/master/vision/classification/resnet/model/resnet18-v2-7.onnx

stale[bot] commented 2 years ago

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

microsoft / onnxruntime