tympanix / subsync

Synchronize your subtitles using machine learning
Apache License 2.0
139 stars 16 forks source link

Subtitle syncing for other language than being spoken? #7

Closed SudoHenk closed 5 years ago

SudoHenk commented 5 years ago

First of all, thanks for the complete write-up. Very interesting, for someone with very basic ML knowledge it was very interesting to learn of the additional metrics for log-loss etc.

Anyway, I was trying to automate subtitle retrieval and syncing for my own native language. Often I can get subtitles in my own language, but they do not line up at all. Is it possible to sync subtitles in another language, than what is spoken in the media file?

Also, does this take into account that sometimes subtitles have advertisements in their subtitles, e.g. feeding false information?

Edit: perhaps it's a good idea to create a wrapper that uses both https://github.com/Diaoul/subliminal to download and subsync to align each subtitle. That would be a killer solution. AFAIK there is no proper subtitle solution available.

tympanix commented 5 years ago

Thanks for your interest. First of all I take no credit for the article which is linked to. I used it myself for inspiration.

The program does not take into account advertisements in the subtitles. That would be a great addition. Since voice activity detection and not voice recognition is used, yes, the program is able to synchronise your subtitle even though it is different from the spoken language.

I created the tool as a standalone. Using subliminal and subsync together would be great. I suggest you use both as libraries in a simple python script you can write up.

tympanix commented 5 years ago

I have created #8 for the purpose of removing advertisements etc.