openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.11k stars 2.23k forks source link

[Bug] Model compilation takes too much memory in cpp #13601

Closed yszhou2019 closed 1 year ago

yszhou2019 commented 1 year ago
System information
Detailed description

I often follow the following procedure to load an ONNX model and inference with a fixed-size tensor. But today when I ran a benchmark(google benchmark) to test the performance of our model, I found model compilation with shape{1, 500, 80}(compiledModel_ = core_.compile_model(model); ) takes too much memory thus leading to OOM and the whole program was killed.

// load and compile the model
    core_.set_property("AUTO", ov::log::level(ov::log::Level::WARNING));
    model_ = core_.read_model(model_path);
    ov::Shape static_shape = {1, static_cast<unsigned long>(mellen), 80};
    model->reshape(static_shape);
    compiledModel_ = core_.compile_model(model); // leads to OOM
    inferRequest_ = compiledModel_.create_infer_request();
// infer
    ov::Tensor input_tensor(ov::element::f32, static_shape, input.data());
    inferRequest_.set_input_tensor(input_tensor);
    inferRequest_.infer();
    auto wav_out = inferRequest_.get_output_tensor(0);

Acturally I don't know how to deal with this problem. Maybe I shouldn't load ONNX model and compile it to IR in cpp?

Often the mel length varies from 100 to 1300. OOM will happene with mel length >= 500.

ilyachur commented 1 year ago

Hi, @yszhou2019 Could you provide the model size which you tried to load?

yszhou2019 commented 1 year ago

@ilyachur

It's about 54MB

-rw-rw-r--. 1 centos centos  54M Aug  3 07:13 ljspeech_hifigan_finetuned.onnx
ilyachur commented 1 year ago

Could you try to load it on CPU? By default we use AUTO plugin, can you add device as a parameter of compile_model()?

yszhou2019 commented 1 year ago

ov::Exception will be thrown if I load the model with "CPU". like

[OpenVINO] Compiling Mellen 1200
terminate called after throwing an instance of 'ov::Exception'
  what():  Please, check environment due to no supported devices can be used
    ov::Core core;
    core.set_property("CPU", ov::log::level(ov::log::Level::WARNING));

I can send you this model and cpp test file. Could you leave your email?

yszhou2019 commented 1 year ago

It's a little weird. If I set property of core with "AUTO" and compile model with "CPU", it seems OOM isn't gonna to happen. I will confirm this.

ilyachur commented 1 year ago

ov::Exception will be thrown if I load the model with "CPU". like

[OpenVINO] Compiling Mellen 1200
terminate called after throwing an instance of 'ov::Exception'
  what():  Please, check environment due to no supported devices can be used
    ov::Core core;
    core.set_property("CPU", ov::log::level(ov::log::Level::WARNING));

I can send you this model and cpp test file. Could you leave your email?

I meant, could you use ov::CompiledModel model1 = core.compile_model(model, "CPU");? Does it work?

yszhou2019 commented 1 year ago

Yes, it seems work. I'm confirming this.

ilyachur commented 1 year ago

It looks like we have a but inside the AUTO plugin. @wangleis Can you take a look?

yszhou2019 commented 1 year ago

It works. I apply the following patch to my source code and OOM no longer happens.

compiledModel_ = core_.compile_model(model, "CPU");
ilyachur commented 1 year ago

@wangleis @chenhu-wang Can we take a look to the issue from AUTO plugin perspective?

wangleis commented 1 year ago

@yszhou2019 Could you share the following information:

  1. Total memory of the system
  2. Memory usage when app work with compiledModel_ = core_.compile_model(model, "CPU");
  3. log with core.set_property("AUTO", ov::log::level(ov::log::Level::WARNING)) when app work with compiledModel_ = core_.compile_model(model, "AUTO");
yszhou2019 commented 1 year ago

@wangleis Here I set the mellen to 500. That's, model will deal with input with shape of {1, 500, 80}.

  1. Total memory: 15.3 GB
  2. Before model compilation, memory usage is about 1.45G; with "CPU", when compilation is done, it's about 3.42G; without "CPU", memory usage will continously increase until this program is killed; with "AUTO", this program will also be killed due to OOM.
  3. I set the log level to "INFO", and compilation with "AUTO". Also, this program will be killed. Here is the log.
    root@f93eccdcf762:/home/centos/repos/tts-quant/tts-deploy/cpp/build# ./test/hifigan_ov_ort_range1200 
    compilation begins
    [06:44:17.2749]I[plugin.cpp:390][AUTO] load with CNN network
    [06:44:17.3196]I[plugin.cpp:417][AUTO] device:CPU, config:PERFORMANCE_HINT=THROUGHPUT
    [06:44:17.3196]I[plugin.cpp:426][AUTO] device:CPU, priority:0
    [06:44:17.3239]I[auto_schedule.cpp:103][AUTO] ExecutableNetwork start
    [06:44:17.3245]I[auto_schedule.cpp:176][AUTO] select device:CPU
    Killed
songbell commented 1 year ago

@yszhou2019 can you have another try with compiledModel = core.compile_model(model, "CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)); and share the memory usage?

yszhou2019 commented 1 year ago

@yszhou2019 can you have another try with compiledModel = core.compile_model(model, "CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)); and share the memory usage?

@songbell Of course. With mellen = 500, memory usage is 1.32G at first and ends up with 14.9 GB when compilation is done. With mellen 550, OOM happens.

songbell commented 1 year ago

@yszhou2019 can you have another try with compiledModel = core.compile_model(model, "CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)); and share the memory usage?

@songbell Of course. With mellen = 500, memory usage is 1.32G at first and ends up with 14.9 GB when compilation is done. With mellen 550, OOM happens.

@ilyachur , seems compile with CPU in throughput mode will cause the memory starvation. in AUTO, we will default use throughput mode if user does not set perf hint

wangleis commented 1 year ago

@yszhou2019 Thanks for your information. Could you share the log when app run with "CPU" w/ and w/o ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)?

yszhou2019 commented 1 year ago

@yszhou2019 Thanks for your information. Could you share the log when app run with "CPU" w/ and w/o ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)?

@wangleis wdym with "CPU" "w/" and "w/o"?

wangleis commented 1 year ago

@yszhou2019 Please share the log for below two commands:

  1. compiledModel_ = core_.compile_model(model, "CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));
  2. compiledModel_ = core_.compile_model(model);
yszhou2019 commented 1 year ago

@wangleis OOM happens in both cases. With this command compiledModel_ = core_.compile_model(model, "CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT)); , the log is

root@f93eccdcf762:/home/centos/repos/tts-quant/tts-deploy/cpp/build# ./test/hifigan_ov_ort_range1200 
compilation begins
Killed

With this command compiledModel_ = core_.compile_model(model);, the log is

root@f93eccdcf762:/home/centos/repos/tts-quant/tts-deploy/cpp/build# ./test/hifigan_ov_ort_range1200 
compilation begins
[07:51:50.9471]I[plugin.cpp:390][AUTO] load with CNN network
[07:51:50.9960]I[plugin.cpp:417][AUTO] device:CPU, config:PERFORMANCE_HINT=THROUGHPUT
[07:51:50.9960]I[plugin.cpp:426][AUTO] device:CPU, priority:0
[07:51:50.9982]I[auto_schedule.cpp:103][AUTO] ExecutableNetwork start
[07:51:50.9989]I[auto_schedule.cpp:176][AUTO] select device:CPU
Killed
wangleis commented 1 year ago

@yszhou2019 Sorry for typo, I mean log for command compiledModel_ = core_.compile_model(model, "CPU"); which can work well.

yszhou2019 commented 1 year ago

@yszhou2019 Sorry for typo, I mean log for command compiledModel_ = core_.compile_model(model, "CPU"); which can work well.

@wangleis It' ok. With this command compiledModel_ = core_.compile_model(model, "CPU");, no logs are printed when compilation is done even if I set the log level to ov::log::Level::TRACE with core_.set_property("AUTO", ov::log::level(ov::log::Level::TRACE));

peterchen-intel commented 1 year ago

@yszhou2019 Per my understanding, you need to set core_.set_property("CPU", ov::log::level(ov::log::Level::TRACE)); compiledModel_ = core_.compile_model(model, "CPU", ov::hint::performance_mode(ov::hint::PerformanceMode::THROUGHPUT));

yszhou2019 commented 1 year ago

@peterchen-intel OK. Here is the log of this code.

root@f93eccdcf762:/home/centos/repos/tts-quant/tts-deploy/cpp/build# ./test/hifigan_ov_ort_range1200 
compilation begins
terminate called after throwing an instance of 'ov::Exception'
  what():  Failed to create plugin /opt/intel/openvino_2022.2.0.7713/runtime/lib/intel64/libopenvino_intel_cpu_plugin.so for device CPU
Please, check your environment
[ NOT_FOUND ] Unsupported property LOG_LEVEL by CPU plugin

Aborted (core dumped)
yszhou2019 commented 1 year ago

Files in runtime/lib/intel64

root@f93eccdcf762:/opt/intel/openvino_2022.2.0.7713/runtime/lib/intel64# ls -l
total 110204
-rw-r--r--. 1 root root  8872422 Sep 12 14:19 cache.json
lrwxrwxrwx. 1 root root       20 Sep 15 18:00 libgna.so.2 -> libgna.so.3.0.0.1455
-rw-r--r--. 1 root root  3120536 Jan 10  2022 libgna.so.3.0.0.1455
-rw-r--r--. 1 root root 13908632 Sep 12 15:31 libopenvino.so
-rw-r--r--. 1 root root   293072 Sep 12 15:31 libopenvino_auto_batch_plugin.so
-rw-r--r--. 1 root root   473456 Sep 12 15:31 libopenvino_auto_plugin.so
-rw-r--r--. 1 root root   203008 Sep 12 15:31 libopenvino_c.so
-rw-r--r--. 1 root root  1259672 Sep 12 15:31 libopenvino_gapi_preproc.so
-rw-r--r--. 1 root root   358608 Sep 12 15:31 libopenvino_hetero_plugin.so
-rw-r--r--. 1 root root 36327336 Sep 12 15:31 libopenvino_intel_cpu_plugin.so
-rw-r--r--. 1 root root  4240392 Sep 12 15:31 libopenvino_intel_gna_plugin.so
-rw-r--r--. 1 root root 19331728 Sep 12 15:31 libopenvino_intel_gpu_plugin.so
-rw-r--r--. 1 root root  5865656 Sep 12 15:31 libopenvino_intel_hddl_plugin.so
-rw-r--r--. 1 root root  6111624 Sep 12 15:31 libopenvino_intel_myriad_plugin.so
-rw-r--r--. 1 root root   342536 Sep 12 15:31 libopenvino_ir_frontend.so
-rw-r--r--. 1 root root  3933320 Sep 12 15:31 libopenvino_onnx_frontend.so
-rw-r--r--. 1 root root  1064696 Sep 12 15:31 libopenvino_paddle_frontend.so
-rw-r--r--. 1 root root  2695192 Sep 12 15:31 libopenvino_tensorflow_fe.so
-rw-r--r--. 1 root root  2100096 Mar  7  2022 pcie-ma2x8x.mvcmd
-rw-r--r--. 1 root root      935 Sep 12 15:22 plugins.xml
-rw-r--r--. 1 root root  2298808 Mar  7  2022 usb-ma2x8x.mvcmd
drwxr-xr-x. 2 root root     4096 Sep 12 15:31 vpu_custom_kernels
yuxu42 commented 1 year ago

@yszhou2019 which CPU are you using?

yszhou2019 commented 1 year ago

@yuxu42

root@f93eccdcf762:/# lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   46 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           106
Model name:                      Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Stepping:                        6
CPU MHz:                         2900.008
BogoMIPS:                        5800.01
Hypervisor vendor:               KVM
Virtualization type:             full
L1d cache:                       192 KiB
L1i cache:                       128 KiB
L2 cache:                        5 MiB
L3 cache:                        54 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Mmio stale data:   Vulnerable: Clear CPU buffers attempted, no microcode; SMT Host state unknown
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; Load fences, usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid 
                                 sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq 
                                 rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 ida arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq md_clear spec_ctrl intel_sti
                                 bp flush_l1d arch_capabilities
yszhou2019 commented 1 year ago

@yszhou2019 which CPU are you using?

It's quite weird. I have no idea why an exception was thrown when using "CPU" instead of "AUTO".

liubo-intel commented 1 year ago

Hi, @yszhou2019 : currently we are enabling a patch to optimize CPU plugin temporary memory usage. but not sure whether this method would help solve your problem. so, if possible could you please provide the model you are using? we may help have a try it with our patch.

yszhou2019 commented 1 year ago

@liubo-intel OK. I've already randomized all the parameters of this model. I just checked it and OOM still happens. Please check your email to get this model. If anyone needs the model in this issue, just at me.

liubo-intel commented 1 year ago

Hi, @yszhou2019 : we have tried this 'HiFiGAN' model on our side, it shows the similar phenomenon as you described. looks like the Instruction Set Architecture of Ice Lake machine(Intel(R) Xeon(R) Platinum 8375C CPU you are now using) consume very large memory when process HiFiGAN model, and you can also find that if use older generation CPU platform(e.g. Cascade Lake machine) when process this 'HiFiGAN' model(with the same software environment and similar CPU core numbers), it will consume much less memory, but also the process performance also drops a lot.

so it looks quit like not a bug, but the additional costs for some cases when using this more powerful latest Instruction Set Architecture. but of course, thanks for pointing out this problem. we will continue to have a look at it to see whether it could continued to be optimized.

For your current problem. I think there maybe two ways for your consideration: 1) Increase your physical memory, then you can fully enjoy the process performance of this new platform :). 2) if increase physical memory is not a choice, you can decrease the memory consume by using less 'stream' during openvino core process, by the following way: ov::Core core; core.set_property("CPU", ov::num_streams(2)); by default, it will use 4 streams on your platform(Intel(R) Xeon(R) Platinum 8375C), you can set it to 1/2/3/ to see whether the memory consume can be under the maximum of your physical memory size. But, of course decrease streams number usage will reduce processing performance.

yszhou2019 commented 1 year ago

@liubo-intel Acturally, the latter way doesn't help solve OOM.ov::Core core; core.set_property("CPU", ov::num_streams(2)); I notice OOM happens when compiling with static shape. But it seemes whether this situation will happen depends on hardware.

When I run this program with openvino 2022.2.0 on AWS (Intel(R) Xeon(R) Platinum 8375C CPU ), OOM happens. It looks like each static model takes up nearly 1-2GB memory. When compiling the 7th model, OOM happens. It takes more than 13 GB memory. image

[2022-11-07 09:14:44.355][ INFO] [TTS-Core]: Compiled 1 th model
[2022-11-07 09:14:54.902][ INFO] [TTS-Core]: Compiled 2 th model
[2022-11-07 09:15:08.149][ INFO] [TTS-Core]: Compiled 3 th model
[2022-11-07 09:15:23.587][ INFO] [TTS-Core]: Compiled 4 th model
[2022-11-07 09:15:41.168][ INFO] [TTS-Core]: Compiled 5 th model
[2022-11-07 09:16:00.732][ INFO] [TTS-Core]: Compiled 6 th model
[2022-11-07 09:16:22.821][ INFO] [TTS-Core]: Compiled 7 th model
Killed

But on our local device with Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz or with Intel(R) Xeon(R) Gold 6342 CPU @ 2.80GHz, this situation no longer happens. Static model only takes about 8GB memory when running this program. The picture bellow shows memory when model compilation is done and before running this program. image

All the three cases are running on docker(almalinux8).

yszhou2019 commented 1 year ago

BTW, with openvino==2022.1.0, running this program ( with 10 static models) will takes no more than 2GB on AWS (Intel(R) Xeon(R) Platinum 8375C CPU ).

liubo-intel commented 1 year ago

"I notice OOM happens when compiling with static shape. But it seemes whether this situation will happen depends on hardware." @yszhou2019 you are correct, openvino will decide the default streams numbers which it will use for inference based on cpu cores, socket numbers, etc. these parameters are different between different CPU models(hardware).

liubo-intel commented 1 year ago

BTW, with openvino==2022.1.0, running this program ( with 10 static models) will takes no more than 2GB on AWS (Intel(R) Xeon(R) Platinum 8375C CPU ).

That's also right. from 2022.2 version, we introduce a new series of cpu kernels for some platform(including the ice lake CPU platform you are using, e.g. Xeon(R) Platinum 8375C). these kind of new cpu kernels have better performance for most NN cases, but as mentioned before, it require larger memory consumption for some cases, e.g. this HIFI-GAN network, which have many large size convolution kernels and feature maps. So, we recommend to increase the memory size of your machine to use such kind of new kernels to gain better performance.

BTW, you mentioned that you load 10 such memory consuming HIFI-GAN models in your application, in this way set stream to 2 maybe not enough, have you try setting it to 1?core.set_property("CPU", ov::num_streams(1)); but, even it works, we not recommend to use this way for throughput performance target. maybe useful for your debug target.

yszhou2019 commented 1 year ago

BTW, you mentioned that you load 10 such memory consuming HIFI-GAN models in your application, in this way set stream to 2 maybe not enough, have you try setting it to 1?core.set_property("CPU", ov::num_streams(1)); but, even it works, we not recommend to use this way for throughput performance target. maybe useful for your debug target.

@liubo-intel I just set ov::num_streams to 1 and this process is still killed. But it doesn't matter. Maybe I can use openvino==2022.1.0 just for static inference or openvino==2022.2.0 for dynamic inference.

Hoping this problem can be solved in next version. Thank all of you! Guys.

I will leave this issue open.

liubo-intel commented 1 year ago

BTW, you mentioned that you load 10 such memory consuming HIFI-GAN models in your application, in this way set stream to 2 maybe not enough, have you try setting it to 1?core.set_property("CPU", ov::num_streams(1)); but, even it works, we not recommend to use this way for throughput performance target. maybe useful for your debug target.

@liubo-intel I just set ov::num_streams to 1 and this process is still killed. But it doesn't matter. Maybe I can use openvino==2022.1.0 just for static inference or openvino==2022.2.0 for dynamic inference.

Hoping this problem can be solved in next version. Thank all of you! Guys.

I will leave this issue open.

Hi, @yszhou2019 : from our previous investigation, I think this case is not suitable to use 'bug' label here, in order to better classify all kinds of problems for the future tracking, let's remove this label.

yszhou2019 commented 1 year ago

@liubo-intel I also notice another problem. In this situation, I choose openvino==2022.2.0 and only dynamic inference. When this program calls void inference(...) for many times, memory continously goes up thus leading to OOM and killed.

void inference(const std::vector<float>& input, std::vector<float>* output)
{
   // ... get size

    ov::Shape shape{1, size, 80};
    ov::Tensor inputTensor(ov::element::f32, shape, input.data());
    ov::InferRequest inferRequest = dynamicModel_.create_infer_request();
    inferRequest.set_input_tensor(inputTensor);
    inferRequest.infer();
    ov::Tensor output = inferRequest.get_output_tensor();
    const float *outputBuffer = output.data<const float>();
    output->resize(newWavLen);
    std::copy(outputBuffer, outputBuffer + newWavLen, pWav->data());
    return;
}

When I remove ov::CompiledModel from the data memeber and only add inferRequest as data member, the same happens. With chip Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz and docker image almalinux8.

yszhou2019 commented 1 year ago

"When this program revoke void inference(...) for many times, memory continously goes up thus leading to OOM and killed."

The same happens with openvino==2022.1.0 and Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz.

yszhou2019 commented 1 year ago

@liubo-intel You can try this with openvino==2022.2.0 or with 2022.1.0. It looks like memory leak.

#include <cstdlib>
#include <iostream>
#include <string>
#include <vector>

#include "openvino_model.h"

std::string prefix = "/home/repos/resources/models/";

class Wrapper {
public:
    Wrapper(const std::string& model_path, int32_t mellen);
    void inference(std::vector<float>& input, std::vector<float>& output);
private:
    int32_t       mellen_;
    ov::InferRequest inferRequest_;
};

Wrapper::Wrapper (const std::string& model_path, int32_t mellen): mellen_(mellen) {
    ov::Core core;
    core.set_property("AUTO", ov::log::level(ov::log::Level::WARNING));
    std::shared_ptr<ov::Model> model = core.read_model(model_path);
    ov::Shape static_shape = {1, static_cast<unsigned long>(mellen), 80};
    model->reshape(static_shape);
    ov::CompiledModel compiled = core.compile_model(model, "CPU");
    inferRequest_ = compiled.create_infer_request();
}

void Wrapper::inference(std::vector<float>& input, std::vector<float>& output) {
    int32_t melLen     = input.size() / 80;
    int32_t newWavLen  = melLen * 256;
    assert(melLen == mellen_);
    output.resize(newWavLen);
    ov::Shape static_shape{1, static_cast<unsigned long>(melLen), static_cast<unsigned long>(80)};
    ov::Tensor input_tensor(ov::element::f32, static_shape, input.data());
    inferRequest_.set_input_tensor(input_tensor);
    inferRequest_.infer();
}

int main()
{
    std::vector<float> mel;
    std::vector<float> wav;
    int mellen = 300;
    mel.resize(mellen * 80);
    for (int i = 0; i < mellen * 80; i++){
        mel[i] = rand() % 78;
    }
    std::string vocoder_path = prefix + "hifigan.onnx";
    auto model = Wrapper(vocoder_path, mellen);
    while (1) {
        wav.clear();
        model.inference(mel, wav);
    }
}
liubo-intel commented 1 year ago

@yszhou2019 thanks for the detail description and codes for your findings. looks like you modify the default openvino InferenceEngine part(ov::CompiledMode, ov::inferRequest), I'm not sure whether this modification suitable, I could help have a look at CPU Plugin part if needed, but sorry that I'm not so familiar with InferenceEngine part. @ilyachur , @wangleis could you please help have a quick look at this new question of @yszhou2019 , or CC to some colleagues who is familiar with this part? Thx.

ilyachur commented 1 year ago

@liubo-intel I don't see here any modification of common OpenVINO part, looks like it is a reproducer.

@yszhou2019 Am I right that on the last code snippets (with CPU device) you have a memory leak if you call inference in the cycle?

@dmitry-gorokhov CC

yszhou2019 commented 1 year ago

@liubo-intel I don't see here any modification of common OpenVINO part, looks like it is a reproducer.

@yszhou2019 Am I right that on the last code snippets (with CPU device) you have a memory leak if you call inference in the cycle?

@dmitry-gorokhov CC

@ilyachur Yeah. I think there may be a memory leak in the last line.

Here is the capture of yesterday 9:59PM and today 11:16AM.

image

image

ilyachur commented 1 year ago

@dmitry-gorokhov Could you or ask somebody to take a look from CPU perspective?

yszhou2019 commented 1 year ago

Hi, guys. I just use valgrind to test this simple program(load and infer for once, then end). Here is the result. It seems memory leak happens in libtbb.

➜  build git:(dev-v1.1) ✗ valgrind --leak-check=full --track-origins=yes --suppressions=../build_support/valgrind.supp  ./test 
==2157683== Memcheck, a memory error detector
==2157683== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2157683== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==2157683== Command: ./test
==2157683== 
==2157683== 
==2157683== HEAP SUMMARY:
==2157683==     in use at exit: 149,109 bytes in 113 blocks
==2157683==   total heap usage: 1,356,232 allocs, 1,356,119 frees, 805,216,338 bytes allocated
==2157683== 
==2157683== 640 bytes in 2 blocks are possibly lost in loss record 7 of 14
==2157683==    at 0x4C3BE4B: calloc (vg_replace_malloc.c:1328)
==2157683==    by 0x4017582: UnknownInlinedFun (rtld-malloc.h:44)
==2157683==    by 0x4017582: allocate_dtv (dl-tls.c:371)
==2157683==    by 0x4017F91: _dl_allocate_tls (dl-tls.c:618)
==2157683==    by 0x7413E22: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.28.so)
==2157683==    by 0x6F683C2: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x6F72F36: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x1139E9E7: ??? (in /opt/intel/runtime/lib/intel64/libopenvino_intel_cpu_plugin.so)
==2157683==    by 0x6F75118: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x6F7273B: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x113A57D6: ??? (in /opt/intel/runtime/lib/intel64/libopenvino_intel_cpu_plugin.so)
==2157683==    by 0x113A5F8D: ??? (in /opt/intel/runtime/lib/intel64/libopenvino_intel_cpu_plugin.so)
==2157683==    by 0x116AD7E0: ??? (in /opt/intel/runtime/lib/intel64/libopenvino_intel_cpu_plugin.so)
==2157683== 
==2157683== 640 bytes in 2 blocks are possibly lost in loss record 8 of 14
==2157683==    at 0x4C3BE4B: calloc (vg_replace_malloc.c:1328)
==2157683==    by 0x4017582: UnknownInlinedFun (rtld-malloc.h:44)
==2157683==    by 0x4017582: allocate_dtv (dl-tls.c:371)
==2157683==    by 0x4017F91: _dl_allocate_tls (dl-tls.c:618)
==2157683==    by 0x7413E22: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.28.so)
==2157683==    by 0x6F683C2: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x6F72F36: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x112D03D7: ??? (in /opt/intel/runtime/lib/intel64/libopenvino_intel_cpu_plugin.so)
==2157683==    by 0x6F75118: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x6F7273B: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x112D0036: ??? (in /opt/intel/runtime/lib/intel64/libopenvino_intel_cpu_plugin.so)
==2157683==    by 0x112D5FFD: ??? (in /opt/intel/runtime/lib/intel64/libopenvino_intel_cpu_plugin.so)
==2157683==    by 0x1108DC94: ??? (in /opt/intel/runtime/lib/intel64/libopenvino_intel_cpu_plugin.so)
==2157683== 
==2157683== 28,800 bytes in 90 blocks are possibly lost in loss record 13 of 14
==2157683==    at 0x4C3BE4B: calloc (vg_replace_malloc.c:1328)
==2157683==    by 0x4017582: UnknownInlinedFun (rtld-malloc.h:44)
==2157683==    by 0x4017582: allocate_dtv (dl-tls.c:371)
==2157683==    by 0x4017F91: _dl_allocate_tls (dl-tls.c:618)
==2157683==    by 0x7413E22: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.28.so)
==2157683==    by 0x6F683C2: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x6F67F2D: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x6F67F05: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==2157683==    by 0x74131CE: start_thread (in /usr/lib64/libpthread-2.28.so)
==2157683==    by 0x69B8DD2: clone (in /usr/lib64/libc-2.28.so)
==2157683== 
==2157683== LEAK SUMMARY:
==2157683==    definitely lost: 0 bytes in 0 blocks
==2157683==    indirectly lost: 0 bytes in 0 blocks
==2157683==      possibly lost: 30,080 bytes in 94 blocks
==2157683==    still reachable: 119,029 bytes in 19 blocks
==2157683==                       of which reachable via heuristic:
==2157683==                         newarray           : 49,200 bytes in 6 blocks
==2157683==         suppressed: 0 bytes in 0 blocks
==2157683== Reachable blocks (those to which a pointer was found) are not shown.
==2157683== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==2157683== 
==2157683== For lists of detected and suppressed errors, rerun with: -s
==2157683== ERROR SUMMARY: 3 errors from 3 contexts (suppressed: 0 from 0)
ilyachur commented 1 year ago

@peterchen-intel Could you take a look? Looks like we have some fixes for TBB in the master branch. Am I right?

yszhou2019 commented 1 year ago

@peterchen-intel Could you take a look? Looks like we have some fixes for TBB in the master branch. Am I right?

@ilyachur The valgrind result is from openvino2022.1.0. I haven't checked memory leak with openvino2022.2.0.

ilyachur commented 1 year ago

Unfortunately 2022.1 is not a LTS release, it means that we won't fix issues in this releases. At the current moment I would like to understand do we still have the same problem for the master branch or not. If I remember right we had some changes for TBB.

yszhou2019 commented 1 year ago

@ilyachur Here is valgrind result with openvino==2022.2.0

➜  build git:(dev-v1.1) ✗ valgrind --leak-check=full --track-origins=yes --suppressions=../build_support/valgrind.supp  ./test 
==3397996== Memcheck, a memory error detector
==3397996== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==3397996== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==3397996== Command: ./test
==3397996== 
==3397996== Warning: client switching stacks?  SP change: 0x1ffeffcb58 --> 0xfaaa000
==3397996==          to suppress, use: --max-stackframe=137159322456 or greater
==3397996== Warning: client switching stacks?  SP change: 0xfaaa000 --> 0x1ffeffcb58
==3397996==          to suppress, use: --max-stackframe=137159322456 or greater
==3397996== Warning: client switching stacks?  SP change: 0x1ffeffcb58 --> 0xfaaa000
==3397996==          to suppress, use: --max-stackframe=137159322456 or greater
==3397996==          further instances of this message will not be shown.
==3397996== 
==3397996== HEAP SUMMARY:
==3397996==     in use at exit: 40,600 bytes in 53 blocks
==3397996==   total heap usage: 1,805,380 allocs, 1,805,327 frees, 9,279,785,467 bytes allocated
==3397996== 
==3397996== 640 bytes in 2 blocks are possibly lost in loss record 3 of 6
==3397996==    at 0x4C3BE4B: calloc (vg_replace_malloc.c:1328)
==3397996==    by 0x4017582: UnknownInlinedFun (rtld-malloc.h:44)
==3397996==    by 0x4017582: allocate_dtv (dl-tls.c:371)
==3397996==    by 0x4017F91: _dl_allocate_tls (dl-tls.c:618)
==3397996==    by 0x74B5E22: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.28.so)
==3397996==    by 0x700A3C2: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==3397996==    by 0x7014F36: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==3397996==    by 0x1136D247: ???
==3397996==    by 0x7017118: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==3397996==    by 0x701473B: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==3397996==    by 0x11374003: ???
==3397996==    by 0x113747F7: ???
==3397996==    by 0x116CED50: ???
==3397996== 
==3397996== 14,400 bytes in 45 blocks are possibly lost in loss record 5 of 6
==3397996==    at 0x4C3BE4B: calloc (vg_replace_malloc.c:1328)
==3397996==    by 0x4017582: UnknownInlinedFun (rtld-malloc.h:44)
==3397996==    by 0x4017582: allocate_dtv (dl-tls.c:371)
==3397996==    by 0x4017F91: _dl_allocate_tls (dl-tls.c:618)
==3397996==    by 0x74B5E22: pthread_create@@GLIBC_2.2.5 (in /usr/lib64/libpthread-2.28.so)
==3397996==    by 0x700A3C2: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==3397996==    by 0x7009F2D: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==3397996==    by 0x7009F05: ??? (in /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2)
==3397996==    by 0x74B51CE: start_thread (in /usr/lib64/libpthread-2.28.so)
==3397996==    by 0x6A5ADD2: clone (in /usr/lib64/libc-2.28.so)
==3397996== 
==3397996== LEAK SUMMARY:
==3397996==    definitely lost: 0 bytes in 0 blocks
==3397996==    indirectly lost: 0 bytes in 0 blocks
==3397996==      possibly lost: 15,040 bytes in 47 blocks
==3397996==    still reachable: 25,560 bytes in 6 blocks
==3397996==                       of which reachable via heuristic:
==3397996==                         newarray           : 24,600 bytes in 3 blocks
==3397996==         suppressed: 0 bytes in 0 blocks
==3397996== Reachable blocks (those to which a pointer was found) are not shown.
==3397996== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==3397996== 
==3397996== For lists of detected and suppressed errors, rerun with: -s
==3397996== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
➜  build git:(dev-v1.1) ✗ ldd test
        linux-vdso.so.1 (0x00007ffc44dea000)
        libopenvino.so => /opt/intel/runtime/lib/intel64/libopenvino.so (0x00007f5981a13000)
        libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007f598167e000)
        libm.so.6 => /lib64/libm.so.6 (0x00007f59812fc000)
        libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007f59810e4000)
        libc.so.6 => /lib64/libc.so.6 (0x00007f5980d1f000)
        libdl.so.2 => /lib64/libdl.so.2 (0x00007f5980b1b000)
        libtbb.so.2 => /opt/intel/runtime/3rdparty/tbb/lib/libtbb.so.2 (0x00007f59808b3000)
        libtbbmalloc.so.2 => /opt/intel/runtime/3rdparty/tbb/lib/libtbbmalloc.so.2 (0x00007f5980658000)
        libpthread.so.0 => /lib64/libpthread.so.0 (0x00007f5980438000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f5982cbe000)
        librt.so.1 => /lib64/librt.so.1 (0x00007f5980230000)
➜  build git:(dev-v1.1) ✗ cat /opt/intel/runtime/include/openvino/core/version.hpp 
// Copyright (C) 2018-2022 Intel Corporation
// SPDX-License-Identifier: Apache-2.0
//
#pragma once

#include <map>
#include <ostream>

#include "openvino/core/core_visibility.hpp"

/**
 * @def OPENVINO_VERSION_MAJOR
 * @brief Defines OpenVINO major version
 *
 * @def OPENVINO_VERSION_MINOR
 * @brief Defines OpenVINO minor version
 *
 * @def OPENVINO_VERSION_PATCH
 * @brief Defines OpenVINO patch version
 */

#define OPENVINO_VERSION_MAJOR 2022
#define OPENVINO_VERSION_MINOR 2
#define OPENVINO_VERSION_PATCH 0
yszhou2019 commented 1 year ago

Here is a capture of inference in an endless loop(with openvino==2022.2.0) image

tomdol commented 1 year ago

You can also try running valgrind with the massif plugin - this might help you narrow down the memory leak root cause.