machinewrapped / gpt-subtrans

Open Source project using LLMs to translate SRT subtitles
Other
310 stars 36 forks source link

Upload translated subtitles to OpenSubtitles etc.? #146

Closed IlmariKu closed 2 months ago

IlmariKu commented 3 months ago

I was wondering, when we are translating the subtitles, I feel like we should help our brothers and sisters, by adding support of uploading the translated subtitles into some public service, where they can be downloaded. I've been doing this manually, but would like to do it more automatically and just spread the love.

It could be by adding this library https://github.com/agonzalezro/python-opensubtitles/tree/master or by some other pattern. But, what do you think, does it make sense?

machinewrapped commented 3 months ago

Good idea! I'll look into it.

I usually run a "fix common errors" in SubtitleEdit and then watch the film with them making any corrections that are needed (there are always some) before uploading them. I'd like to add more automated corrections, e.g. adding line breaks, and improve validation & retranslation so that they are closer to being "finished" when they're translated.

IlmariKu commented 3 months ago

Yeah, I've been thinking that matter and that line in the documentation "It is highly recommended to use Subtitle Edit's (https://www.nikse.dk/subtitleedit) "Fix Common Errors" to clean up the translated subtitles (e.g. to add line breaks).".

I haven't stumbled into errors yet, but I also can't download the software (I'm on Mac). Is there any way we could programmatically do this? It would seems just, parsing and validating.

machinewrapped commented 3 months ago

The main reason I haven't attempted it yet is that SubtitleEdit does a good job and has years of development behind it, so it makes sense to leverage their efforts as there's never a shortage of other things I want to work on.

I'm currently running a custom local build of SubtitleEdit that splits long lines at more natural breaks though, so building that into gpt-subtrans would make sense - and that's the main thing that tends to need fixing since LLMs generally have perfect spelling, grammar and punctuation.

The other common issue that needs fixing is splitting/merging/correcting dialogue based on who is speaking, but I can't think of a way to do that without actually watching the movie yet :-)

IlmariKu commented 2 months ago

I'll close this one, since it's not really relevant to anything.