openvinotoolkit / openvino_build_deploy

Pre-built components and code samples to help you build and deploy production-grade AI applications with the OpenVINO™ Toolkit from Intel
Apache License 2.0
50 stars 31 forks source link

Custom AI Assistant kit - Error #13

Closed LeonMatch closed 1 month ago

LeonMatch commented 3 months ago

Discussed in https://github.com/openvinotoolkit/openvino_notebooks/discussions/2191

When I run "Model Conversion and Optimization" step of the instructions, first command: python convert_and_optimize_asr.py --precision int8 I get an error: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\\Users\\User\\AppData\\Local\\Temp\\tmp2oxvro82\\openvino_decoder_model.bin' Despite the error, the model gets created. The second command in "Model Conversion and Optimization" step runs fine: python convert_and_optimize_chat.py --chat_model_type llama3-8B --precision int4

However I'm having an issue when running the app: The Speech Recognition "Distil-Whisper" model is very inaccurate. It does not recognize simple words. Completely misses it. How do I improve it's quality? I am using Meta-Llama-3-8B-Instr from HuggingFace downloaded into root (custom_ai_assistant) directory.

Thanks, Leon

brmarkus commented 3 months ago

Which notebook under "https://github.com/openvinotoolkit/openvino_notebooks/tree/latest/notebooks" are you referring to?

LeonMatch commented 3 months ago

Hello, I am referring to recipes, custom_ai_assistant. I'm still having issues with this solution. The Speech Recognition Distil-Whisper model is very inaccurate. It does not recognize simple words. Completely misses it. How do I improve it's quality? I am using Meta-Llama-3-8B-Instr from HuggingFace downloaded into root (custom_ai_assistant) directory.

brmarkus commented 3 months ago

Can you share a link to the mentioned recipes, please? And share the steps you did - to allow the dev-team and comunity to reproduce and analyze further.

LeonMatch commented 3 months ago

Here is the link to the recipe: https://github.com/openvinotoolkit/openvino_notebooks/tree/recipes/recipes/custom_ai_assistant

Steps to reproduce:

  1. git clone -b recipes https://github.com/openvinotoolkit/openvino_notebooks.git openvino_notebooks
  2. cd openvino_notebooks/recipes/custom_ai_assistant
  3. conda create -n venv python=3.11 -y
  4. conda activate venv
  5. python -m pip install --upgrade pip pip install -r requirements.txt
  6. huggingface-cli login
  7. git lfs install
  8. git clone https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct meta-llama/Meta-Llama-3-8B-Instruct
  9. python convert_and_optimize_asr.py --precision int8
  10. python convert_and_optimize_chat.py --chat_model_type llama3-8B --precision int4
  11. python app.py --asr_model_dir model/distil-whisper-large-v2-INT8 --chat_model_dir model/llama3-8B-INT4
  12. access UI at provided url, eg: http://127.0.0.1:xxxx and test the model.

I got an error in step9: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\User\AppData\Local\Temp\tmp2oxvro82\openvino_decoder_model.bin' Despite the error, the model gets created and command in step 10 executes fine. However as I mentioned before, the Speech Recognition "Distil-Whisper" model is very inaccurate. It does not recognize simple words. Completely misses it. I'm sending: "Hello. I am having migraines". It transcribes: "Hello. the theirinds." And then replies: "You're experiencing the hives..."

My Python version is 3.11.9 Thank you for follow up!

adrianboguszewski commented 3 months ago

Hi @LeonMatch, Thanks for reporting this. I transferred this issue to the OpenVINO Build & Deploy repository as this is a new home for all AI Ref Kits. We recently identified this issue and fixed it. Could you try the latest version of the recipe from this repo?

LeonMatch commented 3 months ago

Hi @adrianboguszewski, Can you please share a link to "OpenVINO Build & Deploy" repository. From which repo should I try it again, from this [openvino_notebooks.git] or [openvino_build_deploy] that you mentioned?

Thank you.

adrianboguszewski commented 3 months ago

Starting from now, you should use only Build & Deploy repo for recipes

The link: https://github.com/openvinotoolkit/openvino_build_deploy/tree/master/ai_ref_kits/custom_ai_assistant

LeonMatch commented 3 months ago

Thanks @adrianboguszewski, You're saying that you identified and fixed the issue but I see the last commit in Build & Deploy repo code was only 2 weeks ago. Is this correct? Is the fix already there? Can you also collaborate more on what was the issue?

adrianboguszewski commented 3 months ago

The fix was committed in this: https://github.com/openvinotoolkit/openvino_build_deploy/commit/bbd5411fda07694f25054ca76452fc63b6ab0db7

We found that sometimes Whisper doesn't transcribe correctly when running with AUTO, so use GPU or CPU directly :)

LeonMatch commented 3 months ago

@adrianboguszewski, I still get an error when running python convert_and_optimize_asr.py --precision int8 in "Model Conversion and Optimization" step of the instructions: PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'C:\Users\User\AppData\Local\Temp\tmpwuvuq7ey\openvino_decoder_model.bin' I checked the system and don't see openvino_decoder_model.bin used by another process. I also deleted temporary files but it did not help.

Is this error critical?

I was able to complete remaining installation steps and run the app. The transcription is much more accurate now. However the inference is very slow, 16 s for a simple phrase to transcribe and 32 s to respond. Is it normal? My machine has Intel i7 processor with 32G of RAM and Nvidia Quatro I ran other models locally using Ollama with just text to text inference and it was much faster.

adrianboguszewski commented 3 months ago

@AnishaUdayakumar have you ever encountered the error above during your tests?

@LeonMatch what exact CPU model do you use?

raymondlo84 commented 2 months ago

https://colab.research.google.com/drive/1WtoA3aq3lZ88rUWIFltBckXpyvy3IIpH?usp=sharing Here is a simple colab example for converting the llama3 to INT4. You can download the model here after the conversion and should work ;)

adrianboguszewski commented 1 month ago

Closing this because of lack of activity. Please reopen if still needed.