microsoft / onnxruntime

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
https://onnxruntime.ai
MIT License
14.59k stars 2.92k forks source link

Onnx create session takes a long time. #13240

Open xyh523078979 opened 2 years ago

xyh523078979 commented 2 years ago

Describe the issue

Onnx create session takes a long time.

Urgency

No response

Target platform

c++

Build script

The model size is 174M, we firstly load model file into buffer, then we create an ort session using the model data byte array. use this: Session(Env& env, const void* model_data, size_t model_data_length, const SessionOptions& options);

 gettimeofday(&startTime, NULL);
 session = new Ort::Experimental::Session(env, content, length, session_options);
 gettimeofday(&endTime, NULL);
 printf("create session  takes:%fs\n", TIMEDIFF(startTime, endTime));

Error / output

Output: "create session takes:0.399875s" ,  Can session create time consumption be optimized?

Visual Studio Version

No response

GCC / Compiler Version

No response

mszhanyi commented 2 years ago

Hi, did you run it in GPU machine? Is there big difference between first call and the second?

 // first call
 gettimeofday(&startTime, NULL);
 session = new Ort::Experimental::Session(env, content, length, session_options);
 gettimeofday(&endTime, NULL);
 printf("create session  takes:%fs\n", TIMEDIFF(startTime, endTime));
// second call
 gettimeofday(&startTime, NULL);
 session = new Ort::Experimental::Session(env, content, length, session_options);
 gettimeofday(&endTime, NULL);
 printf("create session  takes:%fs\n", TIMEDIFF(startTime, endTime));

Hope this post helps. https://forums.developer.nvidia.com/t/cuda-initialization-takes-too-much-time/52913/2

xyh523078979 commented 2 years ago

we run it in CPU, the difference between first call and second is not big: first call: create session takes:0.376377s second call: create session takes:0.356377s

cloudhan commented 2 years ago

If you merely want to improve the session creation time and only run the model with CPU EP, then there is some rough idea, ort will do some graph optimization on session creation, and it can also dump the optimized graph into a new .onnx file. Then you can use the new onnx file with all optimization disabled to initialize the session. This might potentially improve the init time. But there is so many APIs there so I cannot remember what to call...

pranavsharma commented 2 years ago

The session needs to be created only once in the lifecycle of the process. Assuming it's graph optimizations that is causing the delay, as @cloudhan has suggested you can call SetOptimizedModelFilePath to serialize the session to another onnx file and then henceforth use the serialized onnx file to create the session after calling SetSessionGraphOptimizationLevel with graph_optimization_level = ORT_DISABLE_ALL.