shashikg / WhisperS2T

An Optimized Speech-to-Text Pipeline for the Whisper Model Supporting Multiple Inference Engine
MIT License
318 stars 32 forks source link

Fix for small segments #57

Open Pranjalya opened 7 months ago

Pranjalya commented 7 months ago

Patch

BBC-Esq commented 6 months ago

I like it!

Sembiance commented 5 months ago

Great fix, without it WhisperS2T is useless for small duration audio.

HIGHLY recommend merging this pull request :)

shashikg commented 4 months ago

Hi @Pranjalya @Sembiance ! Can you describe here or link an issue related to small duration audio?

Pranjalya commented 2 months ago

Hey @shashikg, the issue was in the loop where we segment audio into parts and the case where the original audio's duration is < 1s. Using the range function and setting the end timestamp as int(audio_duration) will lead it to it being 0, which when used on range returns an empty list. Using a math.ceil function ensures that it is rounded up to the next ceiling integer and the audio segment timestamp is logged. This bug is potentially dangerous as well if someone is using indexing to map the audio segments, as it leads to missing of the parts.

andriken commented 1 week ago

what will "max_seg_len" do?