openvinotoolkit / openvino

OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
https://docs.openvino.ai
Apache License 2.0
7.39k stars 2.31k forks source link

Memory leak in INT4 optimization of LLaVA model #21612

Closed rmes-ai closed 1 month ago

rmes-ai commented 11 months ago

OpenVINO Version

2023.0.2

Operating System

macOS Systems for Apple Silicon

Device used for inference

CPU

Framework

None

Model used

LLaVA

Issue description

I followed the exact procedure recommended by the OpenVINO notebooks repository and encountered a memory leak when attempting the optimization to INT4.

Step-by-step reproduction

I followed the exact procedure recommended by the OpenVINO notebooks repository and encountered a memory leak when attempting the optimization to INT4.

Relevant log output

warn("The installed version of bitsandbytes was compiled without GPU support.
'NoneType' object has no attribute 'cadam32bit_grad_fp32'
INFO:nncf: NNCF initialized successfully. Supported frameworks detected: torch, openvino zsh: killed
python3 build_and_convert_llava.py
(venv) russel@MacBook-Air openvino_llava % /Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/multiprocessing
/resource_tracker.py:224: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn(" resource_tracker: There appear to be %d '

Issue submission checklist

vurusovs commented 11 months ago

The issue may be caused by lack of RAM. @rmes-ai could you check memory usage and provide more details on it?

wenjiew commented 11 months ago

Additionally, @rmes-ai can you share the URL of the notebook you are trying? If it is about INT4 weight compression, it is started to be supported in OpenVINO 2023.2, not in 2023.0.2. Thanks!

rmes-ai commented 11 months ago

Thank you @vurusovs & @wenjiew for the prompt support.

vurusovs commented 11 months ago

I could replicate the process using my other M2 (64 GB RAM) to see if the issue is related to the RAM?

Yes, it would be great to separate RAM issue from any other functional problems

wenjiew commented 11 months ago

@alvoron Can you help take a look since this is ARM (MacOS) related? Thanks!

alvoron commented 6 months ago

@rmes-ai OpenVINO does not natively support i4/i8 inference on ARM so far, so it's needed to avoid model compression. My M2 reboots while llava weights compression is in progress. If I skip compression and do fp16 inference (OpenVINO on ARM supports FP16 precision natively) then I get probability tensor contains either inf, nan or element < 0 which points to accuracy issue.

I was able to run nanoLLaVA notebook using both fp32 and fp16 precision, could you please try this one with the latest OpenVINO release (ARM functionality develops rapidly and it's better to use the latest release)? https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks/nano-llava-multimodal-chatbot

alvoron commented 1 month ago

@rmes-ai please feel free to reopen the issue if you have any other questions related to this topic.