Open Johere opened 1 week ago
Which version of the OpenVINO-Notebooks do you use? There were changes in the last days:
Can you temporarily remove all the -q
(running quiet option) from all commands in the first cell, run the first cell and check if it was executed successfully?
UPDATE: I needed to delete my existing virt-env folder, create a new virtenv, resync the Notebooks repo, re-install the requirements.txt and then run the Jupyter-Notebook again.
However, the command optimum-cli export openvino --model llava-hf/llava-1.5-7b-hf llava-1.5-7b-hf\FP16 --weight-format fp16
is still running since more than 15 minutes...
I'm using the OpenVINO-notebook with commit-di: d5c6df43edebf273ccef512439aba11910aad633
, branch: latest.
Yes, all the commands in the first cell were executed successfully.
I've pulled the latest changes on branch: latest
, the issue still exists. Is this file(llava-1.5-7b-hf/INT4/openvino_tokenizer.xml) required to be existed? The files under INT4
folder is like:
INT4
βββ added_tokens.json
βββ chat_template.json
βββ config.json
βββ generation_config.json
βββ openvino_language_model.bin
βββ openvino_language_model.xml
βββ openvino_text_embeddings_model.bin
βββ openvino_text_embeddings_model.xml
βββ openvino_vision_embeddings_model.bin
βββ openvino_vision_embeddings_model.xml
βββ preprocessor_config.json
βββ processor_config.json
βββ special_tokens_map.json
βββ tokenizer_config.json
βββ tokenizer.json
βββ tokenizer.model
Thanks for your help!
Have a look into the originally downloaded FP16 folder llava-1.5-7b-hf/FP16
. The pairs of XML&BIN files will be converted&compressed.
This could take VERY VERY long and could require LOTS OF LOTS of RAM. Are you sure the conversion step has already finished, and finished successfully?
It's still running on my machine since more than 30 minutes - and almost all of my 64GB RAM memory is used during conversion... still running...
Have a look into the originally downloaded FP16 folder
llava-1.5-7b-hf/FP16
. The pairs of XML&BIN files will be converted&compressed. This could take VERY VERY long and could require LOTS OF LOTS of RAM. Are you sure the conversion step has already finished, and finished successfully? It's still running on my machine since more than 30 minutes - and almost all of my 64GB RAM memory is used during conversion... still running...
Yes, It works fine in my environment. I think the size of model files are expected:
$ du -sh llava-1.5-7b-hf/*
14G llava-1.5-7b-hf/FP16
4.1G llava-1.5-7b-hf/INT4
Conversion and compression now has finished on my machine. My INT4 folder looks like this:
=> yes, the file openvino_tokenizer.xml
and openvino_tokenizer.bin
should exist...
the files openvino_detokenizer.bin
and openvino_detokenizer.xml
are also missing on your side.
Are the files present in the original FP16 folder?
Start the commands again and watch CPU- and RAM-usage... it will take VERY long and will use LOTS of RAM and CPU-usage.
Ohh I don't have file openvino_tokenizer.xml
and openvino_tokenizer.bin
in FP16 folder. Removed this folder and regenerated using optimum-cli, now I can see these files.
It works well now, thank you so much for the help!
Hi @brmarkus I reopened this issue because I think I'm getting the wrong answer using this example: llava-multimodal-chatbot-genai.ipynb
.
Using the branch latest
, commit-id: dab21db88f51853926aaf0baa63ac8fdf6eeb455
Screenshot attached here:
Thanks!
Hmm, download, conversion and compression is done here:
from cmd_helper import optimum_cli
model_id = "llava-hf/llava-1.5-7b-hf" model_path = Path(model_id.split("/")[-1]) / "FP16"
if not model_path.exists(): optimum_cli(model_id, model_path, additional_args={"weight-format": "fp16"})
There is no version information given - so it could happen that in Huggingface a newer/different/updated/modified model gets updated/upgraded... and then the same query could result in a different response (besides SW-/HW-/platform-specific differences like rounding effects, HW-driver differences could result in different optimizations).
The response, however, sounds "reasonable"... no "hallucination"...
Hmm, download, conversion and compression is done here:
from cmd_helper import optimum_cli model_id = "llava-hf/llava-1.5-7b-hf" model_path = Path(model_id.split("/")[-1]) / "FP16" if not model_path.exists(): optimum_cli(model_id, model_path, additional_args={"weight-format": "fp16"})
There is no version information given - so it could happen that in Huggingface a newer/different/updated/modified model gets updated/upgraded... and then the same query could result in a different response (besides SW-/HW-/platform-specific differences like rounding effects, HW-driver differences could result in different optimizations).
The response, however, sounds "reasonable"... no "hallucination"...
But if I run with llava-multimodal-chatbot-optimum.ipynb
, I can get the reasonable answer:
Do you get the same answer as mine for llava-multimodal-chatbot-genai.ipynb
?
Maybe the picture showing an answer regarding a cat was captured using llava-multimodal-chatbot-optimum.ipynb
instead of using llava-multimodal-chatbot-genai.ipynb
.
Do you get the same results when using different accelerators (CPU, GPU, NPU, AUTO, MULTI)?
Different results when using the original model or quantized versions (INT8, INT8)?
Maybe the picture showing an answer regarding a cat was captured using
llava-multimodal-chatbot-optimum.ipynb
instead of usingllava-multimodal-chatbot-genai.ipynb
. Do you get the same results when using different accelerators (CPU, GPU, NPU, AUTO, MULTI)? Different results when using the original model or quantized versions (INT8, INT8)?
Yes, all model variants give similar results, I've tried INT4 / INT8 / FP16 on GPU, and INT4 on CPU, which I think might not be reasonable enough...
Running Jupyter notebook of llava model: https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llava-multimodal-chatbot/llava-multimodal-chatbot-genai.ipynb
Describe the bug
Expected behavior No code changed, expect to work well.
Screenshots