pszemraj / vid2cleantxt

Python API & command-line tool to easily transcribe speech-based video files into clean text
Apache License 2.0
183 stars 25 forks source link

[Doubt] Can you kindly explain what chunk_length dictates? #20

Closed PyroShade closed 6 months ago

PyroShade commented 7 months ago

How good or bad the output is with change of this attribute? What is the unit (frames, seconds)? In the default setting of chunk_length=30, I can find a lot of text missing.

pszemraj commented 7 months ago

hi! the unit is seconds. there should be some warnings printed out in this case - could you give some more specific examples?

btw - I've noticed that this issue can be specific to an audio/video instance and a particular Whisper model. without knowing more info, I'd recommend trying a different one and seeing if the same issue occurs.

pszemraj commented 6 months ago

Hi, just wanted to check in on this. Any luck with other model sizes/any other questions? Otherwise, I'll close this for now.