openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.35k stars 2.3k forks source link

[Good First Issue][GPU]: Cannot load model when cache directory is running out of disk space #20190

Open p-wysocki opened 1 year ago

p-wysocki commented 1 year ago

Context

GPU plugin utilizes cache files when loading models. When the cache is full, loading another model fails with:

~/pgladkow/openvino_repro/build$ ./test_ov
terminate called after throwing an instance of 'ov::Exception'
  what():  cldnn program build failed! clCreateProgramWithBinary
Aborted 

This task regards solving that issue.

What needs to be done?

The cache should be cleared when not needed. Details should be discussed with people from Contact Points section.

Steps to reproduce:

  1. Get ov.cpp (attachment)
  2. Get CMakeLists.txt (attachment)
  3. Mount cache directory to have a limited disk space (10m)
    sudo mount -t tmpfs -o size=10m tmpfs tmp_cache
  4. Run OpenVINO app
    ~/pgladkow/openvino_repro/build$ cmake ../ovms
    -- Configuring done
    -- Generating done
    -- Build files have been written to: /opt/home/k8sworker/pgladkow/openvino_repro/build
    ~/pgladkow/openvino_repro/build$ cmake --build .
    [100%] Built target test_ov
    ~/pgladkow/openvino_repro/build$ ./test_ov
    Hello 
  5. Verify that cache directory is full
    ~/pgladkow/openvino_repro$ ls tmp_cache/
  6. Run OV app once again - it should fail this time
    ~/pgladkow/openvino_repro/build$ ./test_ov
    terminate called after throwing an instance of 'ov::Exception'
    what():  cldnn program build failed! clCreateProgramWithBinary
    Aborted 

Directory structure from the original reproducer:

~/pgladkow/openvino_repro$ tree
.
├── build
│   ├── CMakeCache.txt
│   ├── CMakeFiles
│   │   ├── 3.16.3
│   │   │   ├── CMakeCCompiler.cmake
│   │   │   ├── CMakeCXXCompiler.cmake
│   │   │   ├── CMakeDetermineCompilerABI_C.bin
│   │   │   ├── CMakeDetermineCompilerABI_CXX.bin
│   │   │   ├── CMakeSystem.cmake
│   │   │   ├── CompilerIdC
│   │   │   │   ├── a.out
│   │   │   │   ├── CMakeCCompilerId.c
│   │   │   │   └── tmp
│   │   │   └── CompilerIdCXX
│   │   │       ├── a.out
│   │   │       ├── CMakeCXXCompilerId.cpp
│   │   │       └── tmp
│   │   ├── cmake.check_cache
│   │   ├── CMakeDirectoryInformation.cmake
│   │   ├── CMakeError.log
│   │   ├── CMakeOutput.log
│   │   ├── CMakeTmp
│   │   ├── Makefile2
│   │   ├── Makefile.cmake
│   │   ├── progress.marks
│   │   ├── TargetDirectories.txt
│   │   └── test_ov.dir
│   │       ├── build.make
│   │       ├── cmake_clean.cmake
│   │       ├── CXX.includecache
│   │       ├── DependInfo.cmake
│   │       ├── depend.internal
│   │       ├── depend.make
│   │       ├── flags.make
│   │       ├── link.txt
│   │       ├── progress.make
│   │       └── src
│   │           └── ov.cpp.o
│   ├── cmake_install.cmake
│   ├── Makefile
│   └── test_ov
├── ovms
│   ├── CMakeLists.txt
│   └── src
│       ├── brain-tumor-segmentation-0002-2
│       │   └── 1
│       │       ├── brain-tumor-segmentation-0002.onnx
│       │       └── description.txt
│       ├── dummy
│       │   └── 1
│       │       ├── dummy.bin
│       │       └── dummy.xml
│       ├── inception-resnet-v2-tf
│       │   └── 1
│       │       ├── inception-resnet-v2-tf.bin
│       │       ├── inception-resnet-v2-tf.mapping
│       │       └── inception-resnet-v2-tf.xml
│       ├── ov.cpp
│       └── ssdlite_mobilenet_v2_ov
│           └── 1
│               ├── ssdlite_mobilenet_v2_ov.bin
│               ├── ssdlite_mobilenet_v2_ov.mapping
│               └── ssdlite_mobilenet_v2_ov.xml
└── tmp_cache 

reproducer.zip

Resources

Contact points

@pgladkows @vladimir-paramuzov

Ticket: 104958

siddhant-0707 commented 1 year ago

Hey @p-wysocki may I work on this?

p-wysocki commented 1 year ago

Of course, I assigned you. Thanks!

siddhant-0707 commented 1 year ago

Wanted to clarify if the exact same error would be reproduced by following the steps. This is the exception I got:

image

This only occurs when I mount cache to have limited space.

p-wysocki commented 1 year ago

@pgladkows @vladimir-paramuzov could you please take a look?

vladimir-paramuzov commented 1 year ago

@siddhant-0707 I think that's a different issue, though this one may require handling too.

The difference is likely related to the method of caching.

p-wysocki commented 1 year ago

Hi @siddhant-0707, are you still working on it? I'm updating the tasks' statuses.

siddhant-0707 commented 1 year ago

Working on the first case you mentioned

We could add a condition is_cache_small and remove cache at https://github.com/openvinotoolkit/openvino/blob/master/src/inference/src/dev/core_impl.cpp#L1430 accordingly. PTAL at the draft PR #20653 Please ignore the changes made to CMakefiles, ov.cpp and the other models.

siddhant-0707 commented 1 year ago

Could we find the directory size of ieCore.get_property("GPU", ov::cache_dir) somehow and then assign is_cache_small ourselves?

ilya-lavrenov commented 1 year ago

Hi @siddhant-0707 As we discussed with @vladimir-paramuzov, we can:

p-wysocki commented 1 year ago

What's the status of this issue? Is the linked PR https://github.com/openvinotoolkit/openvino/pull/20653 still relevant?

p-wysocki commented 11 months ago

@p-durandin could you please take a look at this PR? Seems like a solution, but it was seemingly abandoned. Can it be picked up and continued or should we reopen the task for other contributors?

siddhant-0707 commented 11 months ago

Hey @p-wysocki apologies for the long delay, actually had my end-semester exams going on. I'll be back and working again in about a week. I'll create a new PR according to the solution @ilya-lavrenov mentioned.

p-wysocki commented 11 months ago

Sure, thanks for letting us know!

p-wysocki commented 11 months ago

Just yesterday our CONTRIBUTING.md has been updated with a technical guide - I highly recommend checking it out. :)

p-wysocki commented 10 months ago

I am happy to announce that we have created a channel dedicated to Good First Issues support on our Intel DevHub Discord server! Join it to receive support, engage in discussions, ask questions and talk to OpenVINO developers.

mlukasze commented 8 months ago

moving back to pool of available tickets

AsakusaRinne commented 8 months ago

@p-wysocki Hi, is there any special device required if I take this issue? I have only run openvino on cpu yet and I'm not sure if I could reproduce it with my PC. I have a PC with intel i7 12700 and nvidia RTX 2080Ti.

p-wysocki commented 8 months ago

Hello @AsakusaRinne, while we do have an NVIDIA GPU plugin, I don't know how interdependent they are and if the issue will reproduce when using it.

@pgladkows @vladimir-paramuzov could you please answer?

p-durandin commented 8 months ago

@AsakusaRinne The device with Intel iGPU or dGPU is OK to reproduce this problem. The most Intel processors have integrated graphics, Please install OpenCL and build GPU plugin

vladimir-paramuzov commented 8 months ago

@p-wysocki Hi, is there any special device required if I take this issue? I have only run openvino on cpu yet and I'm not sure if I could reproduce it with my PC. I have a PC with intel i7 12700 and nvidia RTX 2080Ti.

i7-12700 CPU is supposed to have integrated GPU, so you can use it to work on this task.

AsakusaRinne commented 8 months ago

Thank you for your response! Glad to know I could reproduce it with my PC. Since my time is limited in the future 2 weeks, I don't take this issue to leave chance for others to take it. I'll go back to resolve it if no one take it in the future. :)

sparshmittal99 commented 8 months ago

Hey @p-wysocki , may I contribute to this ?

wangyangke commented 8 months ago

WLB#+ .take

github-actions[bot] commented 8 months ago

Thank you for looking into this issue! Please let us know if you have any questions or require any help.

VividLiao commented 8 months ago

WLB#+ .take

github-actions[bot] commented 8 months ago

Thanks for being interested in this issue. It looks like this ticket is already assigned to a contributor. Please communicate with the assigned contributor to confirm the status of the issue.

kevinzhangc commented 8 months ago

WLB#+ .take

github-actions[bot] commented 8 months ago

Thanks for being interested in this issue. It looks like this ticket is already assigned to a contributor. Please communicate with the assigned contributor to confirm the status of the issue.

p-wysocki commented 7 months ago

Hello @wangyangke, can we help you with anything? Are you still working on this?