Memory leak in INT4 optimization of LLaVA model

rmes-ai commented 11 months ago

OpenVINO Version

2023.0.2

Operating System

macOS Systems for Apple Silicon

Device used for inference

CPU

Framework

None

Model used

LLaVA

Issue description

I followed the exact procedure recommended by the OpenVINO notebooks repository and encountered a memory leak when attempting the optimization to INT4.

Step-by-step reproduction

I followed the exact procedure recommended by the OpenVINO notebooks repository and encountered a memory leak when attempting the optimization to INT4.

Relevant log output

warn("The installed version of bitsandbytes was compiled without GPU support.
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
INFO:nncf: NNCF initialized successfully. Supported frameworks detected: torch, openvino zsh: killed
python3 build_and_convert_llava.py
(venv) russel@MacBook-Air openvino_llava % /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing
/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn(" resource_tracker: There appear to be %d '

Issue submission checklist

[X] I'm reporting an issue. It's not a question.
[X] I checked the problem with the documentation, FAQ, open issues, Stack Overflow, etc., and have not found a solution.
[X] There is reproducer code and related data files such as images, videos, models, etc.

vurusovs commented 11 months ago

The issue may be caused by lack of RAM. @rmes-ai could you check memory usage and provide more details on it?

wenjiew commented 11 months ago

Additionally, @rmes-ai can you share the URL of the notebook you are trying? If it is about INT4 weight compression, it is started to be supported in OpenVINO 2023.2, not in 2023.0.2. Thanks!

rmes-ai commented 11 months ago

Thank you @vurusovs & @wenjiew for the prompt support.

Confirming I'm using OpenVINO 2023.2.0 (openvino==2023.2.0 and OpenVINO-dev==2023.2.0).
In terms of memory usage, yes it's using up all my 16 GB of RAM.
I could replicate the process using my other M2 (64 GB RAM) to see if the issue is related to the RAM?
I used: https://docs.openvino.ai/2023.2/gen_ai_guide.html

vurusovs commented 11 months ago

I could replicate the process using my other M2 (64 GB RAM) to see if the issue is related to the RAM?

Yes, it would be great to separate RAM issue from any other functional problems

wenjiew commented 11 months ago

@alvoron Can you help take a look since this is ARM (MacOS) related? Thanks!

alvoron commented 6 months ago

@rmes-ai OpenVINO does not natively support i4/i8 inference on ARM so far, so it's needed to avoid model compression. My M2 reboots while llava weights compression is in progress. If I skip compression and do fp16 inference (OpenVINO on ARM supports FP16 precision natively) then I get probability tensor contains either inf, nan or element < 0 which points to accuracy issue.

I was able to run nanoLLaVA notebook using both fp32 and fp16 precision, could you please try this one with the latest OpenVINO release (ARM functionality develops rapidly and it's better to use the latest release)? https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/nano-llava-multimodal-chatbot

alvoron commented 1 month ago

@rmes-ai please feel free to reopen the issue if you have any other questions related to this topic.

openvinotoolkit / openvino