Question: OCR scoring in the data pipeline

hpcaitech / Open-Sora

Open-Sora: Democratizing Efficient Video Production for All

https://hpcaitech.github.io/Open-Sora/

Apache License 2.0

22.24k stars 2.18k forks source link

Question: OCR scoring in the data pipeline #377

Closed Puiching-Memory closed 6 months ago

Puiching-Memory commented 6 months ago

In your reports, use OCR to identify the text in the images and then eliminate scenes with too much text.

I want to know why too much text affects the model generation.

If so, does that mean that it's difficult to improve the model for text generation, such as newspapers, streets with billboards, and various signs on the driveway lines?

zhengzangw commented 6 months ago

We follow the SVD's pipeline. If the video contains much text, it is hard to generate as the captioning model cannot get the text.

In the future, we plan to use OCR model to generate additional captions for generation, and thus make the model able for text generation.