orcaman / improving_whisper_transcriptions_with_gpt4o

MIT License
10 stars 1 forks source link

Request for Subtitle Generation Feature in Addition to Transcription #2

Open fralapo opened 1 day ago

fralapo commented 1 day ago

I am currently working on improving video transcriptions using the OpenAI API and have successfully integrated a solution that enhances transcription accuracy. However, I believe that extending the functionality to include subtitle generation would be extremely beneficial.

Suggested Enhancement:

Potential Implementation:

Use Case:

The generated subtitles would be helpful for content creators looking to add captions to their videos without requiring additional manual editing. This could significantly reduce the effort involved in captioning, making videos more accessible and enhancing SEO.

Why This Feature Matters:

  1. Accessibility: Subtitles are critical for making content accessible to the hearing-impaired audience.
  2. SEO and Engagement: Videos with subtitles are better indexed and provide improved viewer engagement, especially on social media platforms.
  3. Content Localization: Having timestamps ready would also simplify the process of translating and localizing video content.

This feature could leverage existing transcription models with additional logic to generate time-synced captions, perhaps by integrating models that can perform audio segmentation and alignment.

Thank you for considering this suggestion!

ldenoue commented 7 hours ago

The raw youtube transcript has chunks with timestamps. I suggest you could use those to align the punctuated versions (result of OpenAI) with the original chunks. That's the approach I'm using in https://www.appblit.com/scribe

fralapo commented 3 hours ago

Thanks, I will try this method. However, I encountered a problem while attempting to transcribe a 30-minute video using the OpenAI transcription API. The process fails due to exceeding the maximum content size limit, with the following error message:

413: Maximum content size limit (26214400) exceeded (26265614 bytes read)