Open Pranjalya opened 7 months ago
I like it!
Great fix, without it WhisperS2T is useless for small duration audio.
HIGHLY recommend merging this pull request :)
Hi @Pranjalya @Sembiance ! Can you describe here or link an issue related to small duration audio?
Hey @shashikg, the issue was in the loop where we segment audio into parts and the case where the original audio's duration is < 1s. Using the range function and setting the end timestamp as int(audio_duration)
will lead it to it being 0, which when used on range
returns an empty list. Using a math.ceil
function ensures that it is rounded up to the next ceiling integer and the audio segment timestamp is logged.
This bug is potentially dangerous as well if someone is using indexing to map the audio segments, as it leads to missing of the parts.
what will "max_seg_len" do?
Patch
max_seg_len
generate_segment_batched
in case theseq_len
andseq_metadata
is not provided