Open sunmingtao opened 6 months ago
When generating subtitles for a long video using Whisper or any automatic speech recognition (ASR) system, experiencing a gradual desynchronization over time can be frustrating. This issue can stem from several factors, typically related to the handling of audio processing or the timestamp calculations. Here are a few considerations and potential solutions to address the sync issues:
Ensure that the audio sample rate is consistent throughout the process. Whisper expects audio in a specific format, and discrepancies in sample rate between the actual audio file and what Whisper expects can cause issues in timing.
Sometimes, long non-speech segments or variations in background noise and silence handling can cause drift in the timing. This might not be automatically handled well by the ASR system.
Adjusting the timestamps in the generated subtitles can sometimes be necessary. This can be done by:
Manually Adjusting: Shifting the subtitle timestamps forward or backward by a fixed amount or scaling them to match the video length. Automated Tools: Using tools like Subtitle Edit or Aegisub, which provide functions to stretch, compress, and sync subtitles.
For very long files, consider processing the audio in smaller chunks and then stitching the subtitles together. This can sometimes help manage errors that accumulate over longer periods. However, care must be taken to handle overlaps correctly to avoid abrupt cuts in subtitles.
Ensure that the output format (e.g., .srt) is being correctly generated with accurate start and end times. Errors in formatting can also lead to sync issues.
If you're using Whisper through a command line or a script, experiment with different parameters or configurations that might affect timing, such as different models or settings for handling long audio files.
Given that Whisper and many ASR tools are actively developed, ensure that you're using the latest version. Also, check if there are any open issues or patches related to timing or synchronization in the tool's development repository. Python Script Example for Adjusting SRT Files
Here’s a simple Python script example to adjust all timestamps in an SRT file if you find a consistent delay:
from pysrt import SubRipFile
def adjust_timing(srt_file, seconds):
subs = SubRipFile.open(srt_file)
for sub in subs:
sub.start = sub.start.shift(seconds=seconds)
sub.end = sub.end.shift(seconds=seconds)
subs.save('adjusted_' + srt_file)
# Usage
adjust_timing('example.srt', -5) # Adjust all times by -5 seconds
This script uses pysrt, a Python library for editing SRT files, which you can install via pip (pip install pysrt). Adjust the seconds parameter as needed to either advance or delay the subtitles.
When using whisper to generate subtitles in srt format, I noticed after a certain period of time (around 1 hour), the subtitle starts to be out of sync with the video. I tested generating the subtitles using both python and command line directly. Both results have the same problem