magic-research / PLLaVA

Official repository for the paper PLLaVA
532 stars 35 forks source link

Demo issues #5

Open ZhangScream opened 3 months ago

ZhangScream commented 3 months ago

img_v3_02ad_e653c157-e395-4e8e-86a8-8d492c2c773g I received incorrect answers.What's the issue here?

cathyxl commented 3 months ago

hi @ZhangScream, sorry for your demo error. We have tested the example our own demo, the result is OK as shown below. I've notice a difference in your demo: please set the model_dir=llava-hf/llava-v1.6-vicuna-7b-hf, weight_dir=MODELS/pllava-7b and try again. We will also update the readme in case of confusion.

image

ermu2001 commented 3 months ago

I tried download the model again and the demo went well with our MODELS/pllava-7b.

image

Could you provide more details of the app, maybe the terminal output?

ZhangScream commented 3 months ago

I tried download the model again and the demo went well with our MODELS/pllava-7b.

image

Could you provide more details of the app, maybe the terminal output?

this is my demo.sh 558ea9aa-e183-4579-ab0f-fef4b625b1c1 and my model is 2f2f4127-496f-48b9-a05f-55cb3a51d8f9

ermu2001 commented 3 months ago

These seems good to me.

Are there any specious output in the terminal where you run this demo?

My shell terminal outputs this, and it seems alright

Running DEMO from model_dir: MODELS/pllava-7b
Running DEMO from weights_dir: MODELS/pllava-7b
Running DEMO On Devices: 1
Initializing PLLaVA
You are attempting to use Flash Attention 2.0 without specifying a torch dtype. This might lead to unexpected behaviour
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00,  4.49it/s]
Some weights of PllavaForConditionalGeneration were not initialized from the model checkpoint at MODELS/pllava-7b and are newly initialized: ['language_model.lm_head.weight', 'language_model.model.embed_tokens.weight', 'language_model.model.layers.0.input_layernorm.weight', 'language_model.model.layers.0.mlp.down_proj.weight', 'language_model.model.layers.0.mlp.gate_proj.weight', 'language_model.model.layers.0.mlp.up_proj.weight', 'language_model.model.layers.0.post_attention_layernorm.weight', 'language_model.model.layers.0.self_attn.k_proj.weight', 'language_model.model.layers.0.self_attn.o_proj.weight', 'language_model.model.layers.0.self_attn.q_proj.weight', 'language_model.model.layers.0.self_attn.v_proj.weight', 'language_model.model.layers.1.input_layernorm.weight', 'language_model.model.layers.1.mlp.down_proj.weight', 'language_model.model.layers.1.mlp.gate_proj.weight', 'language_model.model.layers.1.mlp.up_proj.weight', 'language_model.model.layers.1.post_attention_layernorm.weight', 'language_model.model.layers.1.self_attn.k_proj.weight', 'language_model.model.layers.1.self_attn.o_proj.weight', 'language_model.model.layers.1.self_attn.q_proj.weight', 'language_model.model.layers.1.self_attn.v_proj.weight', 'language_model.model.layers.10.input_layernorm.weight', 'language_model.model.layers.10.mlp.down_proj.weight', 'language_model.model.layers.10.mlp.gate_proj.weight', 'language_model.model.layers.10.mlp.up_proj.weight', 'language_model.model.layers.10.post_attention_layernorm.weight', 'language_model.model.layers.10.self_attn.k_proj.weight', 'language_model.model.layers.10.self_attn.o_proj.weight', 'language_model.model.layers.10.self_attn.q_proj.weight', 'language_model.model.layers.10.self_attn.v_proj.weight', 'language_model.model.layers.11.input_layernorm.weight', 'language_model.model.layers.11.mlp.down_proj.weight', 'language_model.model.layers.11.mlp.gate_proj.weight', 'language_model.model.layers.11.mlp.up_proj.weight', 'language_model.model.layers.11.post_attention_layernorm.weight', 'language_model.model.layers.11.self_attn.k_proj.weight', 'language_model.model.layers.11.self_attn.o_proj.weight', 'language_model.model.layers.11.self_attn.q_proj.weight', 'language_model.model.layers.11.self_attn.v_proj.weight', 'language_model.model.layers.12.input_layernorm.weight', 'language_model.model.layers.12.mlp.down_proj.weight', 'language_model.model.layers.12.mlp.gate_proj.weight', 'language_model.model.layers.12.mlp.up_proj.weight', 'language_model.model.layers.12.post_attention_layernorm.weight', 'language_model.model.layers.12.self_attn.k_proj.weight', 'language_model.model.layers.12.self_attn.o_proj.weight', 'language_model.model.layers.12.self_attn.q_proj.weight', 'language_model.model.layers.12.self_attn.v_proj.weight', 'language_model.model.layers.13.input_layernorm.weight', 'language_model.model.layers.13.mlp.down_proj.weight', 'language_model.model.layers.13.mlp.gate_proj.weight', 'language_model.model.layers.13.mlp.up_proj.weight', 'language_model.model.layers.13.post_attention_layernorm.weight', 'language_model.model.layers.13.self_attn.k_proj.weight', 'language_model.model.layers.13.self_attn.o_proj.weight', 'language_model.model.layers.13.self_attn.q_proj.weight', 'language_model.model.layers.13.self_attn.v_proj.weight', 'language_model.model.layers.14.input_layernorm.weight', 'language_model.model.layers.14.mlp.down_proj.weight', 'language_model.model.layers.14.mlp.gate_proj.weight', 'language_model.model.layers.14.mlp.up_proj.weight', 'language_model.model.layers.14.post_attention_layernorm.weight', 'language_model.model.layers.14.self_attn.k_proj.weight', 'language_model.model.layers.14.self_attn.o_proj.weight', 'language_model.model.layers.14.self_attn.q_proj.weight', 'language_model.model.layers.14.self_attn.v_proj.weight', 'language_model.model.layers.15.input_layernorm.weight', 'language_model.model.layers.15.mlp.down_proj.weight', 'language_model.model.layers.15.mlp.gate_proj.weight', 'language_model.model.layers.15.mlp.up_proj.weight', 'language_model.model.layers.15.post_attention_layernorm.weight', 'language_model.model.layers.15.self_attn.k_proj.weight', 'language_model.model.layers.15.self_attn.o_proj.weight', 'language_model.model.layers.15.self_attn.q_proj.weight', 'language_model.model.layers.15.self_attn.v_proj.weight', 'language_model.model.layers.16.input_layernorm.weight', 'language_model.model.layers.16.mlp.down_proj.weight', 'language_model.model.layers.16.mlp.gate_proj.weight', 'language_model.model.layers.16.mlp.up_proj.weight', 'language_model.model.layers.16.post_attention_layernorm.weight', 'language_model.model.layers.16.self_attn.k_proj.weight', 'language_model.model.layers.16.self_attn.o_proj.weight', 'language_model.model.layers.16.self_attn.q_proj.weight', 'language_model.model.layers.16.self_attn.v_proj.weight', 'language_model.model.layers.17.input_layernorm.weight', 'language_model.model.layers.17.mlp.down_proj.weight', 'language_model.model.layers.17.mlp.gate_proj.weight', 'language_model.model.layers.17.mlp.up_proj.weight', 'language_model.model.layers.17.post_attention_layernorm.weight', 'language_model.model.layers.17.self_attn.k_proj.weight', 'language_model.model.layers.17.self_attn.o_proj.weight', 'language_model.model.layers.17.self_attn.q_proj.weight', 'language_model.model.layers.17.self_attn.v_proj.weight', 'language_model.model.layers.18.input_layernorm.weight', 'language_model.model.layers.18.mlp.down_proj.weight', 'language_model.model.layers.18.mlp.gate_proj.weight', 'language_model.model.layers.18.mlp.up_proj.weight', 'language_model.model.layers.18.post_attention_layernorm.weight', 'language_model.model.layers.18.self_attn.k_proj.weight', 'language_model.model.layers.18.self_attn.o_proj.weight', 'language_model.model.layers.18.self_attn.q_proj.weight', 'language_model.model.layers.18.self_attn.v_proj.weight', 'language_model.model.layers.19.input_layernorm.weight', 'language_model.model.layers.19.mlp.down_proj.weight', 'language_model.model.layers.19.mlp.gate_proj.weight', 'language_model.model.layers.19.mlp.up_proj.weight', 'language_model.model.layers.19.post_attention_layernorm.weight', 'language_model.model.layers.19.self_attn.k_proj.weight', 'language_model.model.layers.19.self_attn.o_proj.weight', 'language_model.model.layers.19.self_attn.q_proj.weight', 'language_model.model.layers.19.self_attn.v_proj.weight', 'language_model.model.layers.2.input_layernorm.weight', 'language_model.model.layers.2.mlp.down_proj.weight', 'language_model.model.layers.2.mlp.gate_proj.weight', 'language_model.model.layers.2.mlp.up_proj.weight', 'language_model.model.layers.2.post_attention_layernorm.weight', 'language_model.model.layers.2.self_attn.k_proj.weight', 'language_model.model.layers.2.self_attn.o_proj.weight', 'language_model.model.layers.2.self_attn.q_proj.weight', 'language_model.model.layers.2.self_attn.v_proj.weight', 'language_model.model.layers.20.input_layernorm.weight', 'language_model.model.layers.20.mlp.down_proj.weight', 'language_model.model.layers.20.mlp.gate_proj.weight', 'language_model.model.layers.20.mlp.up_proj.weight', 'language_model.model.layers.20.post_attention_layernorm.weight', 'language_model.model.layers.20.self_attn.k_proj.weight', 'language_model.model.layers.20.self_attn.o_proj.weight', 'language_model.model.layers.20.self_attn.q_proj.weight', 'language_model.model.layers.20.self_attn.v_proj.weight', 'language_model.model.layers.21.input_layernorm.weight', 'language_model.model.layers.21.mlp.down_proj.weight', 'language_model.model.layers.21.mlp.gate_proj.weight', 'language_model.model.layers.21.mlp.up_proj.weight', 'language_model.model.layers.21.post_attention_layernorm.weight', 'language_model.model.layers.21.self_attn.k_proj.weight', 'language_model.model.layers.21.self_attn.o_proj.weight', 'language_model.model.layers.21.self_attn.q_proj.weight', 'language_model.model.layers.21.self_attn.v_proj.weight', 'language_model.model.layers.22.input_layernorm.weight', 'language_model.model.layers.22.mlp.down_proj.weight', 'language_model.model.layers.22.mlp.gate_proj.weight', 'language_model.model.layers.22.mlp.up_proj.weight', 'language_model.model.layers.22.post_attention_layernorm.weight', 'language_model.model.layers.22.self_attn.k_proj.weight', 'language_model.model.layers.22.self_attn.o_proj.weight', 'language_model.model.layers.22.self_attn.q_proj.weight', 'language_model.model.layers.22.self_attn.v_proj.weight', 'language_model.model.layers.23.input_layernorm.weight', 'language_model.model.layers.23.mlp.down_proj.weight', 'language_model.model.layers.23.mlp.gate_proj.weight', 'language_model.model.layers.23.mlp.up_proj.weight', 'language_model.model.layers.23.post_attention_layernorm.weight', 'language_model.model.layers.23.self_attn.k_proj.weight', 'language_model.model.layers.23.self_attn.o_proj.weight', 'language_model.model.layers.23.self_attn.q_proj.weight', 'language_model.model.layers.23.self_attn.v_proj.weight', 'language_model.model.layers.24.input_layernorm.weight', 'language_model.model.layers.24.mlp.down_proj.weight', 'language_model.model.layers.24.mlp.gate_proj.weight', 'language_model.model.layers.24.mlp.up_proj.weight', 'language_model.model.layers.24.post_attention_layernorm.weight', 'language_model.model.layers.24.self_attn.k_proj.weight', 'language_model.model.layers.24.self_attn.o_proj.weight', 'language_model.model.layers.24.self_attn.q_proj.weight', 'language_model.model.layers.24.self_attn.v_proj.weight', 'language_model.model.layers.25.input_layernorm.weight', 'language_model.model.layers.25.mlp.down_proj.weight', 'language_model.model.layers.25.mlp.gate_proj.weight', 'language_model.model.layers.25.mlp.up_proj.weight', 'language_model.model.layers.25.post_attention_layernorm.weight', 'language_model.model.layers.25.self_attn.k_proj.weight', 'language_model.model.layers.25.self_attn.o_proj.weight', 'language_model.model.layers.25.self_attn.q_proj.weight', 'language_model.model.layers.25.self_attn.v_proj.weight', 'language_model.model.layers.26.input_layernorm.weight', 'language_model.model.layers.26.mlp.down_proj.weight', 'language_model.model.layers.26.mlp.gate_proj.weight', 'language_model.model.layers.26.mlp.up_proj.weight', 'language_model.model.layers.26.post_attention_layernorm.weight', 'language_model.model.layers.26.self_attn.k_proj.weight', 'language_model.model.layers.26.self_attn.o_proj.weight', 'language_model.model.layers.26.self_attn.q_proj.weight', 'language_model.model.layers.26.self_attn.v_proj.weight', 'language_model.model.layers.27.input_layernorm.weight', 'language_model.model.layers.27.mlp.down_proj.weight', 'language_model.model.layers.27.mlp.gate_proj.weight', 'language_model.model.layers.27.mlp.up_proj.weight', 'language_model.model.layers.27.post_attention_layernorm.weight', 'language_model.model.layers.27.self_attn.k_proj.weight', 'language_model.model.layers.27.self_attn.o_proj.weight', 'language_model.model.layers.27.self_attn.q_proj.weight', 'language_model.model.layers.27.self_attn.v_proj.weight', 'language_model.model.layers.28.input_layernorm.weight', 'language_model.model.layers.28.mlp.down_proj.weight', 'language_model.model.layers.28.mlp.gate_proj.weight', 'language_model.model.layers.28.mlp.up_proj.weight', 'language_model.model.layers.28.post_attention_layernorm.weight', 'language_model.model.layers.28.self_attn.k_proj.weight', 'language_model.model.layers.28.self_attn.o_proj.weight', 'language_model.model.layers.28.self_attn.q_proj.weight', 'language_model.model.layers.28.self_attn.v_proj.weight', 'language_model.model.layers.29.input_layernorm.weight', 'language_model.model.layers.29.mlp.down_proj.weight', 'language_model.model.layers.29.mlp.gate_proj.weight', 'language_model.model.layers.29.mlp.up_proj.weight', 'language_model.model.layers.29.post_attention_layernorm.weight', 'language_model.model.layers.29.self_attn.k_proj.weight', 'language_model.model.layers.29.self_attn.o_proj.weight', 'language_model.model.layers.29.self_attn.q_proj.weight', 'language_model.model.layers.29.self_attn.v_proj.weight', 'language_model.model.layers.3.input_layernorm.weight', 'language_model.model.layers.3.mlp.down_proj.weight', 'language_model.model.layers.3.mlp.gate_proj.weight', 'language_model.model.layers.3.mlp.up_proj.weight', 'language_model.model.layers.3.post_attention_layernorm.weight', 'language_model.model.layers.3.self_attn.k_proj.weight', 'language_model.model.layers.3.self_attn.o_proj.weight', 'language_model.model.layers.3.self_attn.q_proj.weight', 'language_model.model.layers.3.self_attn.v_proj.weight', 'language_model.model.layers.30.input_layernorm.weight', 'language_model.model.layers.30.mlp.down_proj.weight', 'language_model.model.layers.30.mlp.gate_proj.weight', 'language_model.model.layers.30.mlp.up_proj.weight', 'language_model.model.layers.30.post_attention_layernorm.weight', 'language_model.model.layers.30.self_attn.k_proj.weight', 'language_model.model.layers.30.self_attn.o_proj.weight', 'language_model.model.layers.30.self_attn.q_proj.weight', 'language_model.model.layers.30.self_attn.v_proj.weight', 'language_model.model.layers.31.input_layernorm.weight', 'language_model.model.layers.31.mlp.down_proj.weight', 'language_model.model.layers.31.mlp.gate_proj.weight', 'language_model.model.layers.31.mlp.up_proj.weight', 'language_model.model.layers.31.post_attention_layernorm.weight', 'language_model.model.layers.31.self_attn.k_proj.weight', 'language_model.model.layers.31.self_attn.o_proj.weight', 'language_model.model.layers.31.self_attn.q_proj.weight', 'language_model.model.layers.31.self_attn.v_proj.weight', 'language_model.model.layers.4.input_layernorm.weight', 'language_model.model.layers.4.mlp.down_proj.weight', 'language_model.model.layers.4.mlp.gate_proj.weight', 'language_model.model.layers.4.mlp.up_proj.weight', 'language_model.model.layers.4.post_attention_layernorm.weight', 'language_model.model.layers.4.self_attn.k_proj.weight', 'language_model.model.layers.4.self_attn.o_proj.weight', 'language_model.model.layers.4.self_attn.q_proj.weight', 'language_model.model.layers.4.self_attn.v_proj.weight', 'language_model.model.layers.5.input_layernorm.weight', 'language_model.model.layers.5.mlp.down_proj.weight', 'language_model.model.layers.5.mlp.gate_proj.weight', 'language_model.model.layers.5.mlp.up_proj.weight', 'language_model.model.layers.5.post_attention_layernorm.weight', 'language_model.model.layers.5.self_attn.k_proj.weight', 'language_model.model.layers.5.self_attn.o_proj.weight', 'language_model.model.layers.5.self_attn.q_proj.weight', 'language_model.model.layers.5.self_attn.v_proj.weight', 'language_model.model.layers.6.input_layernorm.weight', 'language_model.model.layers.6.mlp.down_proj.weight', 'language_model.model.layers.6.mlp.gate_proj.weight', 'language_model.model.layers.6.mlp.up_proj.weight', 'language_model.model.layers.6.post_attention_layernorm.weight', 'language_model.model.layers.6.self_attn.k_proj.weight', 'language_model.model.layers.6.self_attn.o_proj.weight', 'language_model.model.layers.6.self_attn.q_proj.weight', 'language_model.model.layers.6.self_attn.v_proj.weight', 'language_model.model.layers.7.input_layernorm.weight', 'language_model.model.layers.7.mlp.down_proj.weight', 'language_model.model.layers.7.mlp.gate_proj.weight', 'language_model.model.layers.7.mlp.up_proj.weight', 'language_model.model.layers.7.post_attention_layernorm.weight', 'language_model.model.layers.7.self_attn.k_proj.weight', 'language_model.model.layers.7.self_attn.o_proj.weight', 'language_model.model.layers.7.self_attn.q_proj.weight', 'language_model.model.layers.7.self_attn.v_proj.weight', 'language_model.model.layers.8.input_layernorm.weight', 'language_model.model.layers.8.mlp.down_proj.weight', 'language_model.model.layers.8.mlp.gate_proj.weight', 'language_model.model.layers.8.mlp.up_proj.weight', 'language_model.model.layers.8.post_attention_layernorm.weight', 'language_model.model.layers.8.self_attn.k_proj.weight', 'language_model.model.layers.8.self_attn.o_proj.weight', 'language_model.model.layers.8.self_attn.q_proj.weight', 'language_model.model.layers.8.self_attn.v_proj.weight', 'language_model.model.layers.9.input_layernorm.weight', 'language_model.model.layers.9.mlp.down_proj.weight', 'language_model.model.layers.9.mlp.gate_proj.weight', 'language_model.model.layers.9.mlp.up_proj.weight', 'language_model.model.layers.9.post_attention_layernorm.weight', 'language_model.model.layers.9.self_attn.k_proj.weight', 'language_model.model.layers.9.self_attn.o_proj.weight', 'language_model.model.layers.9.self_attn.q_proj.weight', 'language_model.model.layers.9.self_attn.v_proj.weight', 'language_model.model.norm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Use lora
Lora Scaling: 0.03125
Finish use lora
Loading weight from MODELS/pllava-7b
<All keys matched successfully>
 Running on local URL:  http://127.0.0.1:7999
 Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.
 None /tmp/gradio/5675459acdd75bf48da26a4f7bc17b1849b37160/jesse_dance.mp4
 Input video shape: 16 1280 720
 ###PROMPT:  You are Pllava, a large vision-language assistant. 
 You are able to understand the video content that the user provides, and assist the user with a variety of tasks using natural language.
 Follow the instructions carefully and explain your answers in detail based on the provided video.
 USER:<image>  USER:What is the man doing? ASSISTANT:
 ###LM OUTPUT TEXT You are Pllava, a large vision-language assistant. 
 You are able to understand the video content that the user provides, and assist the user with a variety of tasks using > natural language.
Follow the instructions carefully and explain your answers in detail based on the provided video.
 USER:   USER:What is the man doing? ASSISTANT: The man in the image appears to be dancing or performing a playful movement in a kitchen setting. He is surrounded by various objects such as a refrigerator, a potted plant, and what looks like a drawing or painting of a character on the floor. The character on the floor is drawn in a cartoon style, and it seems to be part of a playful or artistic activity. The man is wearing a vest and jeans, and he is barefoot, which adds to the casual and fun atmosphere of the scene.
Conversation(system='You are Pllava, a large vision-language assistant. \nYou are able to understand the video content that the user provides, and assist the user with a variety of tasks using natural language.\nFollow the instructions carefully and explain your answers in detail based on the provided video.\n', roles=['USER:', 'ASSISTANT:'], messages=[['USER:', '<image> '], ['USER:', 'What is the man doing?'], ['ASSISTANT:', ' The man in the image appears to be dancing or performing a playful movement in a kitchen setting. He is surrounded by various objects such as a refrigerator, a potted plant, and what looks like a drawing or painting of a character on the floor. The character on the floor is drawn in a cartoon style, and it seems to be part of a playful or artistic activity. The man is wearing a vest and jeans, and he is barefoot, which adds to the casual and fun atmosphere of the scene.']], sep=[' ', '</s>'], mm_token='<image>', mm_style=<MultiModalConvStyle.MM_INTERLEAF: 'mm_inferleaf'>, pre_query_prompt=None, post_query_prompt=None, answer_prompt=None)
Answer:  The man in the image appears to be dancing or performing a playful movement in a kitchen setting. He is surrounded by various objects such as a refrigerator, a potted plant, and what looks like a drawing or painting of a character on the floor. The character on the floor is drawn in a cartoon style, and it seems to be part of a playful or artistic activity. The man is wearing a vest and jeans, and he is barefoot, which adds to the casual and fun atmosphere of the scene.
ZhangScream commented 3 months ago

hi @ZhangScream, sorry for your demo error. We have tested the example our own demo, the result is OK as shown below. I've notice a difference in your demo: please set the model_dir=llava-hf/llava-v1.6-vicuna-7b-hf, weight_dir=MODELS/pllava-7b and try again. We will also update the readme in case of confusion.

image 68b189ba-d81f-492a-b434-0d9791392a61

68b189ba-d81f-492a-b434-0d9791392a61 Thank you, this solves my problem, but could you tell me the reason for doing this?

ZhangScream commented 3 months ago

this is my new demo.sh, and it works. e9fd1403-90f9-45c4-8f0d-5b879d18d309

YepJin commented 3 months ago

May I ask which website it is?

img_v3_02ad_e653c157-e395-4e8e-86a8-8d492c2c773g I received incorrect answers.What's the issue here?

YepJin commented 3 months ago

I have similar problems when I try to run the demo.

  1. First, it seems we can't run the demo on Google Colab, below is the error. image
  2. Second, it seems the 34B Gradio website doesn't work for me... image

Thanks in advance!

cathyxl commented 3 months ago

Hi @YepJin, we have validated the demo code on our server and it works well. Your error seems to be caused by the environment settings of Google Colab.

As for the 34b demo, the gradio link is expired. We've provided a new one and updated to our website. I also attach it here for your reference. https://9e513ff5219b63ef72.gradio.live/

KaiyueSun98 commented 3 months ago

Hi, I came across an error for 34B gradio demo:

Screen Shot 2024-05-06 at 16 12 32

ValueError: The input provided to the model are wrong. The number of image tokens is 0 while the number of image given to the model is 1. This prevents correct indexing and breaks batch generation.

AmitRozner commented 3 months ago

@KaiyueSun98 you can set both model_dirand weight_dir to "pllava-34b" path. I can get an answer without an error but it is not making sense: Answer: ........................................................................................................................................................................................................

zhangchunjie1999 commented 3 months ago

@KaiyueSun98 I have the same problem. Have you solved it?

KaiyueSun98 commented 3 months ago

@KaiyueSun98 I have the same problem. Have you solved it?

not yet

Stevetich commented 2 weeks ago

hi @ZhangScream, sorry for your demo error. We have tested the example our own demo, the result is OK as shown below. I've notice a difference in your demo: please set the model_dir=llava-hf/llava-v1.6-vicuna-7b-hf, weight_dir=MODELS/pllava-7b and try again. We will also update the readme in case of confusion.

image

Hi, first thanks for your contribution. But I am quite curious that I have found PLLaVA tends to respond with "The image ..." for video inputs, is that correct and a normal behavior?

zhangchunjie1999 commented 2 weeks ago

收到邮件了~   祝安!