Open llimllib opened 2 months ago
If you download the captions, they get downloaded in a format that strides the boundaries we're placing. There are a few output formats, but vtt
is the default and is representative:
WEBVTT
Kind: captions
Language: en-US
00:00:00.166 --> 00:00:01.766
Hey, I'm Sam
from Prismic. I'm here
00:00:01.766 --> 00:00:03.333
with Rich
Harris, creator of Svelte.
00:00:03.333 --> 00:00:05.800
And Rich is explaining to
me how you can get Rich quick by
00:00:05.800 --> 00:00:07.033
creating your
own JavaScript framework.
00:00:08.166 --> 00:00:09.233
Thanks for joining me, Rich.
Without the benefit of the nicely segmented transcript that whisper provides us, we have a couple problems:
03.333 -> 05.800
either before the 5-second thumbnail or after itProbably the answer to supporting youtube and whisper is to write functions that return arrays of transcript segments, one for each in the video. So above, we'd parse the VTT file for frames that match the [0-5] segment, then frames that match [5-10], and so on and so forth, and return an array (which must have empty elements for empty segments).
A sample video that has good manually-generated subtitles, useful for testing, is: https://www.youtube.com/watch?v=i-BkN3rTK0Q
It's also helpful to know that --skip-download --write-sub
will only download subs if they are manually added, not automatically generated. Ex:
yt-dlp --skip-download --write-subs -o "manual_subs_only.vtt" 'https://www.youtube.com/watch?v=i-BkN3rTK0Q'
If you try to use that on a video that only has automatic subs, you get:
$ yt-dlp --skip-download --write-subs -o "bulls.vtt" 'https://www.youtube.com/watch?v=lyJ6GyC4Yng'
[info] There are no subtitles for the requested languages
$ echo $?
0
As you can see, unfortunately it returns success so we'd have to check for the presence or absence of bulls.vtt.en-US.vtt
. It's trickier because it inserts the language in there! Maybe I can figure out how to avoid that.
Most youtube auto-generated CCs suck, but some videos have manually attached high-quality CCs, and somebody might want to use them instead.
yt-transcribe -cc <video_url>