suttacentral / bilara

Our Computer Aided Translation software
10 stars 8 forks source link

Adding DeepL column for translation #124

Open DhammaCharts opened 2 years ago

DhammaCharts commented 2 years ago

Hi,

I'm wondering if it would be possible to add an optional column to include a DeepL automatic translation using its API. It would use the english input and give the desire language output.

Glossary are very helpful as they will consistanty translate specific input word in specific output word for a given translator choice. https://www.deepl.com/fr/docs-api/managing-glossaries/creating-a-glossary/

API doc https://www.deepl.com/docs-api

There is a free and a paid API. Is SuttaCentral a charity? and so could ask for a free license? Thanks for your feedback

sujato commented 2 years ago

I'm definitely open to this, but there are a few considerations first.

So basically it comes down to (a) developer time and (b) proving the usefulness.

DhammaCharts commented 2 years ago

Thank you for your reply Bhante !

this strange malady called "life",

;_)

so it would have to be from scratch

I have try to build Bilara locally, not yet sucessful!

So basically it comes down to (a) developer time and (b) proving the usefulness.

(b) I've not met yet someone in the french dhamma community of translators thats doesn't use DeepL. I've been reading, listening and talking in english for 10+ years, plus living in english speaking monastery for 4+ years now and DeepL is better than me ;-) especially in english to french translation in terme of grammar, vocabulary and sentence structure. It lacks of course the context and meaning, but it is quite amazing in and of itself. Pronouns and verb tenses are difficult for DeepL without understanding the context. So every translation still needs very precise human check and correction.

(a) I've started a quick app that fetch the JSON from Bilara-data and create a DeepL equivalent, here is a sketch of how it works:

Bilara Assist drawio (1)

noeismet commented 2 years ago

Unfortunately, I do not have coding skills, but I would happily volunteer for testing and would definitely be an early adopter of a DeepL integration. I use DeepL extensively primarily for speed purposes. I review everything I give it to translate and I am very impressed with the results, which seem to improve over time. I guess that is the idea of AI and deep learning. I use it in two ways: 1) for Suttas, I tend to take the entire sutta into DeepL because it makes a more coherent translation and in that way seems to understand the context pretty well. This approach requires a bit of editing work in DeepL but it's worth it.
2) for Site, I do it paragraph by paragraph as the texts are generally more straightforward in their meaning, much less subject to various interpretations. So here no editing in DeepL is required, it's very quick.

cittadhammo commented 2 years ago
  1. for Suttas, I tend to take the entire sutta into DeepL because it makes a more coherent translation and in that way seems to understand the context pretty well. This approach requires a bit of editing work in DeepL but it's worth it.

OK that is interesting and I was wondering about that. I'll give it a try both ways and see how much difference it gives. thanks! The app I'm building will do it segment by segment and thus looses the context.

noeismet commented 2 years ago

I'll give it a try both ways and see how much difference it gives

In fact, both ways work, and I guess it's a matter of personal preference and workflow. And speaking of workflow precisely, a DeepL integration I think would improve that.

blake-sc commented 2 years ago

From taking a glance at the DeepL API This seems like it'd potentially be very quick to implement.

And as an aside,I have to admit to having noticed that machine learning is getting disgustingly good at doing things. Actually one reason I'd say "do we need to bother?" is that machine translation is getting so good sites hardly need to be localized any more - though I'm sure the quality of machine translation still varies heavily on the language pair and domain.

Honestly the hard part with Bilara is usually the GUI: but I could make a proposal for two possible ways to implement machine translation (neither of which involve a new column).

  1. Hit a shortcut say ctrl-m that fetches the machine translation and inserts it into the field. This would be the easiest way.
  2. Include the machine translation as a distinct translation memory result, it obviously wouldn't have a source text perhaps put "Automatic translation by DeepL" where the source text goes.

The other way of course would be to initially do the entire translation with DeepL, then proofread it, though that doesn't do a good job of indicating progress.

for Suttas, I tend to take the entire sutta into DeepL because it makes a more coherent translation and in that way seems to understand the context pretty well. This approach requires a bit of editing work in DeepL but it's worth it.

I tried translating from Sabbamitta's German translation to English, and it's interesting how much difference this made. Translating the whole sutta (with segments separated by double-newlines) produced a substantially nicer result than translating segment by segment.

For example taking Mara's verses from sn5.1, when translated as a whole:

"There is no escape from the world, What shall your seclusion bring you? Enjoy the pleasures of the senses, so that you will not regret it later."

When translated segment-by-segment:

"There is no escape from the world, What good is your seclusion going to do you? Enjoy the pleasures of the senses, So you won't regret it later."

The difference is fairly small, but in the first case it clearly seems to have translated the verse as verse, for instance using "will not" instead of "won't" in the last line just to make it longer and fit better, while in the second case it can't recognize it as verse and so translates it as prose.

So it'd definitely be worthwhile feeding the entire english translation into DeepL rather than segment-by-segment. With respect to my earlier implementation suggestion this would just mean in the background the server, when requested for a machine translation, feeds the whole text into DeepL, caches the result, and returns segment by segment.

cittadhammo commented 2 years ago

Hi @blake-sc

this would just mean in the background the server, when requested for a machine translation, feeds the whole text into DeepL, caches the result, and returns segment by segment.

This would be an amazing solution!

Also, if a personal deepL glossary could be kept in the translation user folder that would be fantastic. You can create glossaries via the API from what I've read.

If this could be done in a relatively short time, I will give up my small app. I can already create a DeepL translated json segment by segment locally from what I've done, and I can push it manually to the repo. So I will try again to build Bilara on my computer to see if I can help there.

Thanks for your very nice comment. Cittadhammo = DhammaCharts

blake-sc commented 2 years ago

So I will try again to build Bilara on my computer to see if I can help there.

I can help you with that, to get fully functional you need some keys to connect to a repo. It's easiest to use the SuttaCentral Gitter to communicate. I sent you an invite.

noeismet commented 2 years ago

This would be an amazing solution!

I concur!

And, I wonder if the suggestion feature would also work in this way, integrated in Bilara? As it's rather useful I must admit.

image

sujato commented 2 years ago

@blake-sc as far as UI goes, I'd recommend the option of adding the ML to the ordinary TM results, just make sure it has a distinct class so we can color-code it or whatever. Also probably a good idea to include an "off" switch for those who distrust our robot overlords.

blake-sc commented 2 years ago

And, I wonder if the suggestion feature would also work in this way, integrated in Bilara? As it's rather useful I must admit.

I can't really see a way of achieving that. The API can offer a "more" or "less" formal translation but I don't see any way to access suggest functionality via the API, and it'd be a huge pain to program. You'd be best off just copy-pasting into the DeepL web UI.

sabbamitta commented 2 years ago

Also probably a good idea to include an "off" switch for those who distrust our robot overlords.

Yes, please. :smile:

cittadhammo commented 2 years ago

And, I wonder if the suggestion feature would also work in this way, integrated in Bilara? As it's rather useful I must admit.

In my opinion, this would be not possible to do in Bilara itself using the API as far as I can tell.

A solution would be to "intercept" the string (text file) before it goes to DeepL API, i.e. having an option to copy into clipboard the whole sutta (with segments separated by double-newlines). Then copy it into the DeepL desktop App and haivng it side by side with the pali and english in Bilara:

Capture d’écran, le 2022-06-27 à 18 41 31

Then going down simultaniously on the two windows (app) to reach the end of the sutta while correcting the DeepL output in its own app. Once finished, copy paste the resulting translation (with segments separated by double-newlines) into a Bilara input window.

This would allow to use DeepL suggestions but not Bilara suggestions directly... ;-( the nice thing though is that by suggesting changes to DeepL, it improves over time and keep your suggestions in mind throughout the sutta. But I don't know how much advantage this would give overall compared to the previously mentioned solution with API.

noeismet commented 2 years ago

I can't really see a way of achieving that. The API can offer a "more" or "less" formal translation but I don't see any way to access suggest functionality via the API, and it'd be a huge pain to program. You'd be best off just copy-pasting into the DeepL web UI.

In my opinion, this would be not possible to do in Bilara itself using the API as far as I can tell.

Yes, I understand, thank you for your looking into it. Having DeepL translations in Bilara of its own would already be a great advantage, and one can always have the DeepL app open on the side if s/he needs the suggestions.

cittadhammo commented 2 years ago

one can always have the DeepL app open on the side if s/he needs the suggestions.

Yes this is what I thought I was gonna do ;-)