pepri / subtitles-editor

Visual Studio Code extension for editing SubRip Text (SRT) files with subtitles.
https://marketplace.visualstudio.com/items?itemName=pepri.subtitles-editor
58 stars 8 forks source link

Ability to translate blocks of subtitles instead of translating line by line #19

Open mrnossiom opened 2 years ago

mrnossiom commented 2 years ago

Hi,

I ran into a problem when trying to translate subtitles. The program translate lines separately from each other, but this make the translation incorrect. Instead, it would be great to translate block by block to give Google Translate more context.

I can try to think of a better solution to enhance this function: https://github.com/pepri/subtitles-editor/blob/f878ab62967b7b4fbce9194f9fddc7c2fbff7810/src/extension.ts#L356-L366

Thanks for your answer.

snow212-cn commented 2 years ago

It's urgent to improve this ability. I need it badly

klausbadelt commented 2 years ago

@MrNossiom @snow212-cn I agree this could radically improve translation quality. Could you volunteer a PR?

klausbadelt commented 2 years ago

The code seems to already do what you're asking for though, after review. All text lines are collected, then translated in 8000 character blocks. (@pepri correct me if I misunderstand the code).

https://github.com/pepri/subtitles-editor/blob/f878ab62967b7b4fbce9194f9fddc7c2fbff7810/src/extension.ts#L390-L408

I think this request could be closed.

pepri commented 2 years ago

I batch the lines when sending them for translation, but the lines are translated independently by the translation service. To improve this, I would need to join the lines to be translated together, but then I have to split them again as I want to keep multiple lines. I already tested this with # character that works as a separator for this purpose (# character itself should be escaped so it is not lost in translation). I might implement it when I feel like it.

mrnossiom commented 2 years ago

Hello @pepri and @klausbadelt,

So I did a little research and found that the free version we are using isn't documented at all. I think it is an old endpoint keep for backward compatibility reasons. I also found that Google Cloud APIs for batch traduction requires an account with billing setup.

Furthermore, I use another translator since a while called DeepL, they have a fresh API with free plan of 500k chars a month. The API supports batch translation, by sending multiples sentences at once and add context to the translation. The only drawback it that it needs an API key along with an account.

Maybe, we could implement the functionality with DeepL API but keep Google API wonky translations as an alternative. Tell me what you think.

Links

mrnossiom commented 2 years ago

Hey @pepri, Could you please answer my question above ? I know it's more of a side project but I would really like to improve it. Thanks.