twardoch / audiostretchy

AudioStretchy is a Python wrapper around the `audio-stretch` C library, which performs fast, high-quality time-stretching of WAV/MP3 files without changing their pitch. Works well for speech, can time-stretch silence separately.
https://pypi.org/project/audiostretchy/
BSD 3-Clause "New" or "Revised" License
31 stars 2 forks source link

Stretched videos not at the correct length #12

Open LongMingWei opened 1 month ago

LongMingWei commented 1 month ago

I am trying to sync translated audio segments with a video using timestamps returned alongside the audio segment itself from a speech to text package. However, even with the stretch ratio calculated correctly, the duration of certain audio segments become too long, particularly because of a strange long pause at the end of the audio segment. For example in the attached zip folder there is the original audio and the stretched one. When calculating the stretch ratio based on the timestamp, the result duration should be about 5-6 seconds, a stretch ratio of around 1.1. However when inputting it into the stretch audio function, the video becomes 8 seconds instead with a 3 second pause. It will be great to know what's causing the problem and if there's something I am unaware of. The relevant code and audio files are below. Thank you!

`

def generate_segment_audio(segment, speaker_id):
    start, end, translated_text = segment  # Gets start and end timestamps from the audio segment
    segment_path = os.path.join(output_dir, f'segment_{start}_{end}.wav')
    stretched_path = os.path.join(output_dir, f'segment_{start}_{end}_stretched.wav')
    duration = end - start
    # Generate the audio file with the TTS model
    model.tts_to_file(translated_text, speaker_id, segment_path, speed=speed)

    # Adjust the audio speed to match the duration
    segment_audio = AudioSegment.from_file(segment_path)
    current_duration = len(segment_audio) / 1000  # Convert to seconds
    stretch_ratio = duration / current_duration
    print(f'{stretch_ratio} = {duration} / {current_duration}')
    stretch_audio(segment_path, stretched_path, ratio=stretch_ratio)
    return segment_path

`

audiofiles.zip