haotian-liu / LLaVA

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
https://llava.hliu.cc
Apache License 2.0
20.12k stars 2.21k forks source link

[Question] When serving vicuna-13b via cli, the model produces lengthy responses. #520

Open hjsg1010 opened 1 year ago

hjsg1010 commented 1 year ago

Question

I served the model via cli using the following command.

python -m llava.serve.cli --model-path ./data-vol-1/model/llava/llava-336px-pretrain-vicuna-13b-v1.3 --model-base ./data-vol-1/model/llava/vicuna_13b_v1.3 --image-file "./llava/view.jpg"

these llava-336px-pretrain-vicuna-13b-v1.3 and vicuna_13b_v1.3 is downloaded from your links.

However, as you can see in below, the model is providing excessively long responses. Would you happen to have any advice on this matter?

[2023-10-11 06:30:46,626] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading LLaVA from base model...
You are using the legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This means that tokens that come after special tokens will not be properly handled. We recommend you to read the related pull request available at https://github.com/huggingface/transformers/pull/24565
Loading checkpoint shards: 100%|████████████████████████████████████████| 3/3 [00:09<00:00,  3.08s/it]
Some weights of LlavaLlamaForCausalLM were not initialized from the model checkpoint at /home/jovyan/data-vol-1/model/llava/vicuna_13b_v1.3 and are newly initialized: ['model.mm_projector.weight', 'model.mm_projector.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
USER: what do you think about this view? 
ASSISTANT: This is a beautiful view of a lake with a dock leading into the water

It's a great spot for a swim or a boat ride

The mountains in the background add a nice touch to the scenery

It's a great spot for a vacation or a day trip

It's a great spot for a picnic or a fishing trip

It's a great spot for a family outing or a group gathering

It's a great spot for a romantic getaway or a honeymoon

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a great spot for a solo camping trip or a solo fishing trip

It's a great spot for a solo swim or a solo boat ride

It's a great spot for a solo picnic or a solo fishing trip

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a great spot for a solo camping trip or a solo fishing trip

It's a great spot for a solo swim or a solo boat ride

It's a great spot for a solo picnic or a solo fishing trip

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a great spot for a solo camping trip or a solo fishing trip

It's a great spot for a solo swim or a solo boat ride

It's a great spot for a solo picnic or a solo fishing trip

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a great spot for a solo camping trip or a solo fishing trip

It's a great spot for a solo swim or a solo boat ride

It's a great spot for a solo picnic or a solo fishing trip

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a great spot for a solo camping trip or a solo fishing trip

It's a great spot for a solo swim or a solo boat ride

It's a great spot for a solo picnic or a solo fishing trip

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a ^[[19~great spot for a solo camping trip or a solo fishing trip

It's a great spot for a solo swim or a solo boat ride

It's a great spot for a solo picnic or a solo fishing trip

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a great spot for a solo camping trip or a solo fishing trip

It's a great spot for a solo swim or a solo boat ride

It's a great spot for a solo picnic or a solo fishing trip

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a great spot for a solo camping trip or a solo fishing trip

It's a great spot for a solo swim or a solo boat ride

It's a great spot for a solo picnic or a solo fishing trip

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a great spot for a solo camping trip or a solo fishing trip

It's a great spot for a solo swim or a solo boat ride

It's a great spot for a solo picnic or a solo fishing trip

It's a great spot for a solo adventure or a solo retreat

It's a great spot for a solo hike or a solo bike ride

It's a great spot for a solo camping trip or a solo fishing trip

It'
USER: what should I be cautious when I go to this place
ASSISTANT: When visiting this place, it is important to be cautious of the following:

* The water is deep and can be dangerous for swimming, especially for children or those who are not strong swimmers
* The dock is not stable and can be dangerous to walk on
* The area is surrounded by mountains and can be prone to sudden weather changes, including thunderstorms and heavy rain
* The area is home to wildlife, including bears and other animals, so it is important to be aware of your surroundings and take necessary precautions
* The area is remote and there may not be easy access to medical care or other services in case of an emergency
* The area is prone to insects and other pests, so it is important to bring insect repellent and other necessary items to protect against bites and stings
* The area is prone to wildfires, so it is important to be aware of fire safety and follow any guidelines or regulations in place to prevent fires
* The area is prone to flash floods, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent flooding
* The area is prone to landslides, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent landslides
* The area is prone to avalanches, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent avalanches
* The area is prone to earthquakes, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent earthquakes
* The area is prone to volcanic eruptions, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent volcanic eruptions
* The area is prone to tsunamis, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent tsunamis
* The area is prone to hurricanes, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent hurricanes
* The area is prone to tornadoes, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent tornadoes
* The area is prone to wildfires, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent wildfires
* The area is prone to wildfires, so it is important to be aware of weather conditions and follow any guidelines or regulations in place to prevent wildfires
* The area is prone to wildfires, so it is important to be aware of weather conditions and regulations in place to wildfire
* The area is important to be aware of weather conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and conditions and
USER: thankyou
ASSISTANT: 
USER: thank you
ASSISTANT: 
RicoWjr commented 1 year ago

I met the similar question with the model llava-v1.5-7b, the responses are extremely worse than yours —— The model repeats some nonsense words or numbers.

barshag commented 1 year ago

what can be done to overcome that? @RicoWjr (Me too encountered that)

haotian-liu commented 1 year ago

Hey, these are the projector weights that are only trained with image-text pairs, and are not NOT instruction tuned, which means they do NOT follow instructions as good as our official models, and can output repetitive, lengthy, and garbled outputs.

You need to use LLaVA v1.5 models directly.

I just added these clarifications in the MODEL ZOO, hopefully that clears some of the doubts.

These are projector weights we have pretrained. You can use these projector weights for visual instruction tuning. They are just pretrained on image-text pairs, and are NOT instruction tuned, which means they do NOT follow instructions as good as our official models, and can output repetitive, lengthy, and garbled outputs. If you want to have nice conversations with LLaVA, use the checkpoints above (LLaVA v1.5).

hjsg1010 commented 1 year ago

Hey, these are the projector weights that are only trained with image-text pairs, and are not NOT instruction tuned, which means they do NOT follow instructions as good as our official models, and can output repetitive, lengthy, and garbled outputs.

You need to use LLaVA v1.5 models directly.

I just added these clarifications in the MODEL ZOO, hopefully that clears some of the doubts.

These are projector weights we have pretrained. You can use these projector weights for visual instruction tuning. They are just pretrained on image-text pairs, and are NOT instruction tuned, which means they do NOT follow instructions as good as our official models, and can output repetitive, lengthy, and garbled outputs. If you want to have nice conversations with LLaVA, use the checkpoints above (LLaVA v1.5).

Thx for reply. I should read your clarifications.

I have another question. I have a vicuna13b model that has been finetuned with my own text data (only language finetuning, not image-text pair finetuning using your script). Would it be possible to utilize this in your llava framework?

Could I perhaps change the model-base in this command, or modify my custom vicuna config in some way?

python -m llava.serve.cli --model-path ./data-vol-1/model/llava/llava-336px-pretrain-vicuna-13b-v1.3 --model-base ./data-vol-1/model/llava/vicuna_13b_v1.3 --image-file "./llava/view.jpg" 
haotian-liu commented 1 year ago

@hjsg1010 the clarifications are added just now after I see this issue :(

If your finetuned vicuna is based on Vicuna v1.3, you may try this one: https://huggingface.co/liuhaotian/llava-v1-0719-336px-lora-vicuna-13b-v1.3

It is lora tuned, which means it may be somehow compatible and can be plugged in with a modified version of Vicuna v1.3, to give it visual capabilities, but I haven't tried something like this so there is no guarantee.

Check out instructions here on how to launch a model worker with LoRA adapters. CLI should be similar.

https://github.com/haotian-liu/LLaVA/blob/main/docs/LoRA.md#launch-a-model-worker

hjsg1010 commented 1 year ago

@hjsg1010 the clarifications are added just now after I see this issue :(

If your finetuned vicuna is based on Vicuna v1.3, you may try this one: https://huggingface.co/liuhaotian/llava-v1-0719-336px-lora-vicuna-13b-v1.3

It is lora tuned, which means it may be somehow compatible and can be plugged in with a modified version of Vicuna v1.3, to give it visual capabilities, but I haven't tried something like this so there is no guarantee.

Check out instructions here on how to launch a model worker with LoRA adapters. CLI should be similar.

https://github.com/haotian-liu/LLaVA/blob/main/docs/LoRA.md#launch-a-model-worker

Are you suggesting that I should specify this model https://huggingface.co/liuhaotian/llava-v1-0719-336px-lora-vicuna-13b-v1.3 as the model-path, and set my custom model as the model-base?

thx for reply. I'll read through it again carefully and give it a try.

haotian-liu commented 1 year ago

Your understanding is correct.

hjsg1010 commented 1 year ago

@haotian-liu Thanks a lot. You've been a great help to me. I wish the multi-turn conversation feature would be available soon in the CLI environment or Jupyter. For now, I'm also trying to implement it myself.

haotian-liu commented 1 year ago

Wait, multi-turn conversation is already supported. See the gif (wait for around 10 seconds or more to see the second query): https://github.com/haotian-liu/LLaVA#cli-inference

RicoWjr commented 1 year ago

what can be done to overcome that? @RicoWjr (Me too encountered that)

I just find that i pulled the llava docker image which pushed months ago by someone and its code is not compatible with the latest llava-v1.5. You can check whether your code version is compatible with the model version

hjsg1010 commented 1 year ago

Wait, multi-turn conversation is already supported. See the gif (wait for around 10 seconds or more to see the second query): https://github.com/haotian-liu/LLaVA#cli-inference

oh, I mean several conversation with several images. so I can test few-shot with my images