openvinotoolkit / openvino_notebooks

πŸ“š Jupyter notebook tutorials for OpenVINOβ„’
Apache License 2.0
2.41k stars 809 forks source link

llava-multimodal-chatbot-genai run failed #2484

Open Johere opened 1 week ago

Johere commented 1 week ago

Running Jupyter notebook of llava model: https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llava-multimodal-chatbot/llava-multimodal-chatbot-genai.ipynb

Describe the bug

RuntimeError: Exception from src/inference/src/cpp/core.cpp:90:
Check 'util::directory_exists(path) || util::file_exists(path)' failed at src/frontends/common/src/frontend.cpp:113:
FrontEnd API failed with GeneralFailure:
ir: Could not open the file: "llava-1.5-7b-hf/INT4/openvino_tokenizer.xml"

Expected behavior No code changed, expect to work well.

Screenshots image

brmarkus commented 1 week ago

Which version of the OpenVINO-Notebooks do you use? There were changes in the last days:

https://github.com/openvinotoolkit/openvino_notebooks/commits/latest/notebooks/llava-multimodal-chatbot/llava-multimodal-chatbot-genai.ipynb

Can you temporarily remove all the -q (running quiet option) from all commands in the first cell, run the first cell and check if it was executed successfully?

UPDATE: I needed to delete my existing virt-env folder, create a new virtenv, resync the Notebooks repo, re-install the requirements.txt and then run the Jupyter-Notebook again. However, the command optimum-cli export openvino --model llava-hf/llava-1.5-7b-hf llava-1.5-7b-hf\FP16 --weight-format fp16 is still running since more than 15 minutes...

Johere commented 1 week ago

I'm using the OpenVINO-notebook with commit-di: d5c6df43edebf273ccef512439aba11910aad633, branch: latest. Yes, all the commands in the first cell were executed successfully.

I've pulled the latest changes on branch: latest, the issue still exists. Is this file(llava-1.5-7b-hf/INT4/openvino_tokenizer.xml) required to be existed? The files under INT4 folder is like:

INT4
β”œβ”€β”€ added_tokens.json
β”œβ”€β”€ chat_template.json
β”œβ”€β”€ config.json
β”œβ”€β”€ generation_config.json
β”œβ”€β”€ openvino_language_model.bin
β”œβ”€β”€ openvino_language_model.xml
β”œβ”€β”€ openvino_text_embeddings_model.bin
β”œβ”€β”€ openvino_text_embeddings_model.xml
β”œβ”€β”€ openvino_vision_embeddings_model.bin
β”œβ”€β”€ openvino_vision_embeddings_model.xml
β”œβ”€β”€ preprocessor_config.json
β”œβ”€β”€ processor_config.json
β”œβ”€β”€ special_tokens_map.json
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ tokenizer.json
└── tokenizer.model

Thanks for your help!

brmarkus commented 1 week ago

Have a look into the originally downloaded FP16 folder llava-1.5-7b-hf/FP16. The pairs of XML&BIN files will be converted&compressed. This could take VERY VERY long and could require LOTS OF LOTS of RAM. Are you sure the conversion step has already finished, and finished successfully? It's still running on my machine since more than 30 minutes - and almost all of my 64GB RAM memory is used during conversion... still running...

Johere commented 1 week ago

Have a look into the originally downloaded FP16 folder llava-1.5-7b-hf/FP16. The pairs of XML&BIN files will be converted&compressed. This could take VERY VERY long and could require LOTS OF LOTS of RAM. Are you sure the conversion step has already finished, and finished successfully? It's still running on my machine since more than 30 minutes - and almost all of my 64GB RAM memory is used during conversion... still running...

Yes, It works fine in my environment. I think the size of model files are expected:

$ du -sh llava-1.5-7b-hf/*
14G     llava-1.5-7b-hf/FP16
4.1G    llava-1.5-7b-hf/INT4
brmarkus commented 1 week ago

Conversion and compression now has finished on my machine. My INT4 folder looks like this: image

=> yes, the file openvino_tokenizer.xml and openvino_tokenizer.binshould exist... the files openvino_detokenizer.bin and openvino_detokenizer.xml are also missing on your side.

Are the files present in the original FP16 folder?

Start the commands again and watch CPU- and RAM-usage... it will take VERY long and will use LOTS of RAM and CPU-usage.

Johere commented 1 week ago

Ohh I don't have file openvino_tokenizer.xml and openvino_tokenizer.bin in FP16 folder. Removed this folder and regenerated using optimum-cli, now I can see these files.

It works well now, thank you so much for the help!

Johere commented 2 days ago

Hi @brmarkus I reopened this issue because I think I'm getting the wrong answer using this example: llava-multimodal-chatbot-genai.ipynb.

Using the branch latest, commit-id: dab21db88f51853926aaf0baa63ac8fdf6eeb455

Screenshot attached here: image

Thanks!

brmarkus commented 2 days ago

Hmm, download, conversion and compression is done here:

from cmd_helper import optimum_cli

model_id = "llava-hf/llava-1.5-7b-hf" model_path = Path(model_id.split("/")[-1]) / "FP16"

if not model_path.exists(): optimum_cli(model_id, model_path, additional_args={"weight-format": "fp16"})

There is no version information given - so it could happen that in Huggingface a newer/different/updated/modified model gets updated/upgraded... and then the same query could result in a different response (besides SW-/HW-/platform-specific differences like rounding effects, HW-driver differences could result in different optimizations).

The response, however, sounds "reasonable"... no "hallucination"...

Johere commented 2 days ago

Hmm, download, conversion and compression is done here:

from cmd_helper import optimum_cli model_id = "llava-hf/llava-1.5-7b-hf" model_path = Path(model_id.split("/")[-1]) / "FP16" if not model_path.exists(): optimum_cli(model_id, model_path, additional_args={"weight-format": "fp16"})

There is no version information given - so it could happen that in Huggingface a newer/different/updated/modified model gets updated/upgraded... and then the same query could result in a different response (besides SW-/HW-/platform-specific differences like rounding effects, HW-driver differences could result in different optimizations).

The response, however, sounds "reasonable"... no "hallucination"...

But if I run with llava-multimodal-chatbot-optimum.ipynb, I can get the reasonable answer: image Do you get the same answer as mine for llava-multimodal-chatbot-genai.ipynb?

brmarkus commented 1 day ago

Maybe the picture showing an answer regarding a cat was captured using llava-multimodal-chatbot-optimum.ipynb instead of using llava-multimodal-chatbot-genai.ipynb. Do you get the same results when using different accelerators (CPU, GPU, NPU, AUTO, MULTI)? Different results when using the original model or quantized versions (INT8, INT8)?

Johere commented 1 day ago

Maybe the picture showing an answer regarding a cat was captured using llava-multimodal-chatbot-optimum.ipynb instead of using llava-multimodal-chatbot-genai.ipynb. Do you get the same results when using different accelerators (CPU, GPU, NPU, AUTO, MULTI)? Different results when using the original model or quantized versions (INT8, INT8)?

Yes, all model variants give similar results, I've tried INT4 / INT8 / FP16 on GPU, and INT4 on CPU, which I think might not be reasonable enough...