hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All
https://hpcaitech.github.io/Open-Sora/
Apache License 2.0
22.39k stars 2.19k forks source link

caption_llava.py is not work #182

Closed necrophagists closed 8 months ago

necrophagists commented 8 months ago

image

i change the llm to llava-v1.6-mistral-7b,but model's output is empty。

DwanZhang-AI commented 8 months ago

So am I

necrophagists commented 8 months ago

So am I

hi,i am already fix this problem. The code is right, the problem is the prompt['three frames'] is too long for mistral-7b (441 tokens). I changed with a shorter prompt ("Please describe the video with one paragraph"),than it works.

ersanliqiao commented 8 months ago

can you give a example?

ersanliqiao commented 8 months ago

@necrophagists

necrophagists commented 8 months ago

@necrophagists

can you give a example?

try "Please describe the video"

KihongK commented 4 months ago

Should I use 34B llava model?

now I am using A100 40G

And edit prompt (tools > caption > utils.py)

    "video": {
        "text": "Describe this video and its style in a very detailed manner. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences.",
        "type": "video",
    },

to

    "video": {
        "text": "Please describe the video",
        "type": "video",
    },

and i run this script

torchrun --nproc_per_node 1 --standalone -m tools.caption.caption_llava \
  /home/hed/Open-Sora/sample_data/sample_data_split/parts/meta_clips_info_fmin1_aes_part_0_aesmin3.0.csv \
  --model-path liuhaotian/llava-v1.6-vicuna-7b \
  --prompt video \
  --bs 8 \
  --tp-size 1 \
  --dp-size 1 

result csv file like this /home/hed/Open-Sora/sample_data/sample_data_split/big_buck_bunny_240p_2mb_scene-1.mp4,,69 /home/hed/Open-Sora/sample_data/sample_data_split/big_buck_bunny_240p_1mb_scene-2.mp4,,10