Closed ZuhaoLiu closed 3 months ago
Hello, I read your answer from https://github.com/hpcaitech/Open-Sora/issues/303, but I want to explore the caption effect with 3 frames.
However, I found the generated results are weird.
It seems that the generated captions are all incomplete fragments. However, when I use 1 frame, the result is completely normal (several full sentences). Is there a bug here or do I need to adjust other parameters?
This issue is stale because it has been open for 7 days with no activity.
LLaVA is bad at handling multiple frames. Please try the PLLaVA we used in OpenSora 1.2 as documented here.
Hello, I read your answer from https://github.com/hpcaitech/Open-Sora/issues/303, but I want to explore the caption effect with 3 frames.
However, I found the generated results are weird.
It seems that the generated captions are all incomplete fragments. However, when I use 1 frame, the result is completely normal (several full sentences). Is there a bug here or do I need to adjust other parameters?