Caption problem with frame 3

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

https://hpcaitech.github.io/Open-Sora/

Apache License 2.0

21.77k stars 2.11k forks source link

Caption problem with frame 3 #424

Closed ZuhaoLiu closed 3 months ago

ZuhaoLiu commented 4 months ago

Hello, I read your answer from https://github.com/hpcaitech/Open-Sora/issues/303, but I want to explore the caption effect with 3 frames.

However, I found the generated results are weird.

Screenshot 2024-06-03 095348

It seems that the generated captions are all incomplete fragments. However, when I use 1 frame, the result is completely normal (several full sentences). Is there a bug here or do I need to adjust other parameters?

github-actions[bot] commented 3 months ago

This issue is stale because it has been open for 7 days with no activity.

zhengzangw commented 3 months ago

LLaVA is bad at handling multiple frames. Please try the PLLaVA we used in OpenSora 1.2 as documented here.