simonw / action-transcription-demo

A tool for creating a repository of transcribed videos
Apache License 2.0
52 stars 6 forks source link

Extract captions #24

Closed simonw closed 2 years ago

simonw commented 2 years ago

URL

https://www.youtube.com/watch?v=AneNxjSGn1I

github-actions[bot] commented 2 years ago
this project is called action
transcription the idea is to give people
a tool that lets them build a ongoing
repository and Archive of the captions
of dip videos from different social
media sites including those cam captions
translated into English
so we'll do a very quick demo um here is
a YouTube video that is in Russia
um and actually does have YouTube
generated captions so the way you use
this tool is you file GitHub issues I
can paste the URL in here I can tell it
that I'd like to extract the captions
click the button and that's it that's
the entire user interface for end users
so having done this what my tool is
doing is it's spinning up a GitHub
actions task so this right here triggers
when a new issue is created it installs
various bits of software it uses the
YouTube DL
um command line tool to pull back the
captions from that video and then it
saves those captions back into the
repository the user doesn't need to know
this is happening though they're just
sitting here and about 30 seconds after
they initially filed that issue the
action will do its work and it'll reply
to this issue with comments that will
give them the data they want so this is
some logging data then here is the
English translation from from the
YouTube automated translations of that
video
um so that gets applied here it also
gets saved into the repository itself so
we've now got a permanent record of that
which I can then write additional
software against to to analyze that now
that was the demo using YouTube comments
the other thing this can do is if you
tag your issue with whisper that's the
name of an AI model it can take a video
in this case this video on VK entirely
in Russian with no English translations
at all and it can run that through an AI
which will transcribe the Russian but
will also provide an English translation
in incredibly high quality one so this
this right here is doing a whole bunch
of additional work to turn that video
into something that's um that's readable
and usable again the results of that
have been saved in this repository which
means that we can analyze them and build
search and things against them later on
to try this thing out the way you do it
is you click use this template to create
your own copy of this repository and
that's um this is using GitHub template
repositories but anyone who does this
gets their own version which will run
under their own GitHub account and will
allow them to once again start filing
issues to start capturing things in this
new unique database for their own
project so I'm hoping that despite this
being built up on top of GitHub
everything is available through the
browser users who aren't familiar with
the command line don't even know what
git is will still be able to use this
given the right guidance
github-actions[bot] commented 2 years ago
[youtube] AneNxjSGn1I: Downloading webpage
[youtube] AneNxjSGn1I: Downloading android player API JSON
[info] AneNxjSGn1I: Downloading 1 format(s): 22
[youtube] AneNxjSGn1I: Downloading webpage
[youtube] AneNxjSGn1I: Downloading android player API JSON
[info] AneNxjSGn1I: Downloading subtitles: en
[info] AneNxjSGn1I: Downloading 1 format(s): 22
[info] Writing video subtitles to: Bellingcat Hackathon: Action Transcription [AneNxjSGn1I].en.ttml
[download] Destination: Bellingcat Hackathon: Action Transcription [AneNxjSGn1I].en.ttml

[download]    1.00KiB at  Unknown B/s (00:00:00)
[download]    3.00KiB at  Unknown B/s (00:00:00)
[download]    7.00KiB at    4.37MiB/s (00:00:00)
[download]    7.63KiB at    2.25MiB/s (00:00:00)
[download] 100% of    7.63KiB in 00:00:00 at 70.41KiB/s