lbryio / lbry-desktop

A browser and wallet for LBRY, the decentralized, user-controlled content marketplace.
https://lbry.tech
MIT License
3.57k stars 415 forks source link

Allow automatic creation of subtitles through AutoSub, served as WebVTT files #6825

Closed mayeaux closed 2 years ago

mayeaux commented 2 years ago

An important feature to support is captions, this is especially important for accessibility for those who cannot hear, as well as for non-native English speakers to be able to understand content better. Also, it's offered by YouTube and people should not have to feel like they are sacrificing features to use Odysee and we should aim for feature parity or even having additional features when compared to YouTube.

An issue is created for supporting captioning here: https://github.com/lbryio/lbry-desktop/issues/2325

But I will make this ticket only to cover automatically generated captions, with the ability for people to upload their own captions during the upload process to be implemented as a separate ticket.

I tested out the AutoSub module, which is a CLI which integrates open-source Mozilla DeepSpeech for the speech to text functionality, and then through some clever programming is able to correspond that text with the proper timestamps, and it works actually quite well out of the box.

https://github.com/abhirooptalasila/AutoSub

They say they have the capability to output in WebVTT automatically, which I wasn't able to get working with a first attempt, but regardless .srt and .vtt formats are very similar so to convert between the two is trivial and there are a lot of packages that allow that to be carried out.

Once the .vtt file is created, it is trivial to serve it via videojs by adding to this line: ui/component/viewers/videoViewer/internal/videojs.jsx:220

Something along the lines of tracks: [{src: 'https://servestatic.tv/mysub.vtt', kind:'captions', srclang: 'en', label: 'English'}]

I implemented and tested this functionality and was quite impressed with how well Autosub worked. You can see it even properly transcribed the word 'prophylactic'. It was similar to what would be expected from YouTube so I would say that Autosub would work well enough out of the box to ship. screen_shot_2021-08-02_at_22 10 25

AutoSub is also built on top of Mozilla's Deep Speech which although I used against a model trained on an English speaking dataset, there are also models for different languages so we would be able to use those as well though I never tested using a non-English dataset myself. Although, I believe most of the content and viewers are English speaking so this could probably cover in Pareto distribution style maybe 80% of content creators/users right out of the gate. Would also be a great way to begin supporting captioning, at which point the ability for users to upload their custom captions during the upload process could be supported as well.

9mido commented 2 years ago

There is also https://github.com/BingLingGroup/autosub

mayeaux commented 2 years ago

There is also https://github.com/BingLingGroup/autosub

Yes but correct me if I'm wrong but it looks like all the actual speech to text translation is done using a 3rd party (paid) API, is it so?

9mido commented 2 years ago

You have the option to use a paid 3rd party API. But if you don't, you can use the free version of Google speech v2 to create the subtitles.

https://github.com/BingLingGroup/autosub#google-speech-v2

All I had to do was:

autosub -i file.mp4 -S en-US

mayeaux commented 2 years ago

Cool I'll check it out thanks for the tip.

9mido commented 2 years ago

You're welcome.

Can you give us some more details or steps or commands you did to get the mozilla deepspeech autosub working? I am curious to try it out myself but not exactly sure how to use it.

mayeaux commented 2 years ago

Just follow the docs here: https://github.com/abhirooptalasila/AutoSub#installation

You may have to install DeepSpeech separately here: https://deepspeech.readthedocs.io/en/r0.9/?badge=latest

tzarebczan commented 2 years ago

Issue moved to OdyseeTeam/odysee-frontend #165 via ZenHub