Closed necrophagists closed 8 months ago
So am I
So am I
hi,i am already fix this problem. The code is right, the problem is the prompt['three frames'] is too long for mistral-7b (441 tokens). I changed with a shorter prompt ("Please describe the video with one paragraph"),than it works.
can you give a example?
@necrophagists
@necrophagists
can you give a example?
try "Please describe the video"
Should I use 34B llava model?
now I am using A100 40G
And edit prompt (tools > caption > utils.py)
"video": {
"text": "Describe this video and its style in a very detailed manner. Pay attention to all objects in the video. The description should be useful for AI to re-generate the video. The description should be no more than six sentences.",
"type": "video",
},
to
"video": {
"text": "Please describe the video",
"type": "video",
},
and i run this script
torchrun --nproc_per_node 1 --standalone -m tools.caption.caption_llava \
/home/hed/Open-Sora/sample_data/sample_data_split/parts/meta_clips_info_fmin1_aes_part_0_aesmin3.0.csv \
--model-path liuhaotian/llava-v1.6-vicuna-7b \
--prompt video \
--bs 8 \
--tp-size 1 \
--dp-size 1
result csv file like this /home/hed/Open-Sora/sample_data/sample_data_split/big_buck_bunny_240p_2mb_scene-1.mp4,,69 /home/hed/Open-Sora/sample_data/sample_data_split/big_buck_bunny_240p_1mb_scene-2.mp4,,10
i change the llm to llava-v1.6-mistral-7b,but model's output is empty。