Request for Subtitle Generation Feature in Addition to Transcription

fralapo commented 1 day ago

I am currently working on improving video transcriptions using the OpenAI API and have successfully integrated a solution that enhances transcription accuracy. However, I believe that extending the functionality to include subtitle generation would be extremely beneficial.

Suggested Enhancement:

Subtitle Generation (SRT/ASS/Other Formats): In addition to providing the corrected transcription, it would be helpful if the API could assist in generating subtitles in common formats like SRT or ASS. These formats include timestamps, which are essential for synchronizing text with video.

Potential Implementation:

Automatic Segmentation: Once the transcription is corrected, a subtitle generation feature could automatically segment the text based on pauses or logical sentence boundaries. Each segment should be assigned a start and end timestamp to align with the spoken dialogue.
Formatting Options: Provide options for exporting the transcription in various subtitle formats, such as:
- SRT (SubRip Subtitle)
- ASS (Advanced SubStation Alpha)
- VTT (Web Video Text Tracks)

Use Case:

The generated subtitles would be helpful for content creators looking to add captions to their videos without requiring additional manual editing. This could significantly reduce the effort involved in captioning, making videos more accessible and enhancing SEO.

Why This Feature Matters:

Accessibility: Subtitles are critical for making content accessible to the hearing-impaired audience.
SEO and Engagement: Videos with subtitles are better indexed and provide improved viewer engagement, especially on social media platforms.
Content Localization: Having timestamps ready would also simplify the process of translating and localizing video content.

This feature could leverage existing transcription models with additional logic to generate time-synced captions, perhaps by integrating models that can perform audio segmentation and alignment.

Thank you for considering this suggestion!

ldenoue commented 7 hours ago

The raw youtube transcript has chunks with timestamps. I suggest you could use those to align the punctuated versions (result of OpenAI) with the original chunks. That's the approach I'm using in https://www.appblit.com/scribe

fralapo commented 3 hours ago

Thanks, I will try this method. However, I encountered a problem while attempting to transcribe a 30-minute video using the OpenAI transcription API. The process fails due to exceeding the maximum content size limit, with the following error message:

413: Maximum content size limit (26214400) exceeded (26265614 bytes read)

orcaman / improving_whisper_transcriptions_with_gpt4o