sunmingtao / sample-code

3 stars 4 forks source link

Subtitles generated by Whisper start to be out of sync with the audio after a certain period of time #351

Open sunmingtao opened 2 months ago

sunmingtao commented 2 months ago

When using whisper to generate subtitles in srt format, I noticed after a certain period of time (around 1 hour), the subtitle starts to be out of sync with the video. I tested generating the subtitles using both python and command line directly. Both results have the same problem

sunmingtao commented 2 months ago

When generating subtitles for a long video using Whisper or any automatic speech recognition (ASR) system, experiencing a gradual desynchronization over time can be frustrating. This issue can stem from several factors, typically related to the handling of audio processing or the timestamp calculations. Here are a few considerations and potential solutions to address the sync issues:

  1. Audio Sample Rate Consistency

Ensure that the audio sample rate is consistent throughout the process. Whisper expects audio in a specific format, and discrepancies in sample rate between the actual audio file and what Whisper expects can cause issues in timing.

  1. Handling of Silence and Non-Speech Segments

Sometimes, long non-speech segments or variations in background noise and silence handling can cause drift in the timing. This might not be automatically handled well by the ASR system.

  1. Timestamp Adjustments

Adjusting the timestamps in the generated subtitles can sometimes be necessary. This can be done by:

Manually Adjusting: Shifting the subtitle timestamps forward or backward by a fixed amount or scaling them to match the video length. Automated Tools: Using tools like Subtitle Edit or Aegisub, which provide functions to stretch, compress, and sync subtitles.

  1. Incremental Processing

For very long files, consider processing the audio in smaller chunks and then stitching the subtitles together. This can sometimes help manage errors that accumulate over longer periods. However, care must be taken to handle overlaps correctly to avoid abrupt cuts in subtitles.

  1. Verification of Output Format

Ensure that the output format (e.g., .srt) is being correctly generated with accurate start and end times. Errors in formatting can also lead to sync issues.

  1. Experiment with Parameters

If you're using Whisper through a command line or a script, experiment with different parameters or configurations that might affect timing, such as different models or settings for handling long audio files.

  1. Check for Bugs or Updates

Given that Whisper and many ASR tools are actively developed, ensure that you're using the latest version. Also, check if there are any open issues or patches related to timing or synchronization in the tool's development repository. Python Script Example for Adjusting SRT Files

Here’s a simple Python script example to adjust all timestamps in an SRT file if you find a consistent delay:

from pysrt import SubRipFile

def adjust_timing(srt_file, seconds):
    subs = SubRipFile.open(srt_file)
    for sub in subs:
        sub.start = sub.start.shift(seconds=seconds)
        sub.end = sub.end.shift(seconds=seconds)
    subs.save('adjusted_' + srt_file)

# Usage
adjust_timing('example.srt', -5)  # Adjust all times by -5 seconds

This script uses pysrt, a Python library for editing SRT files, which you can install via pip (pip install pysrt). Adjust the seconds parameter as needed to either advance or delay the subtitles.