Rosetta and TMs (translation memory)

bittner commented 3 years ago

Hi there!

We're using Rosetta 0.9.4 on Django 2.2.17, and all is good. Apart from skepticism of professional translators, of course. The main theme is, "The tool doesn't provide a TM, hence we can't use it."

I need some help to understand this topic better.

Note that my wife is a professional translator and project manager in the translation industry, so I am informed largely about the concepts of "traditional translation" of documents (e.g. SDL Trados, Across, OmegaT) but also about the approach emerged from the software development industry (e.g. Transifex, Crowdin), which I have hands-on experience with.

Where is Rosetta's TM?

From my understanding, Rosetta is more or less a nice front-end to manipulate .po files, extracted by Python's gettext module integrated in Django. There are no models, yet still Rosetta does "automatic translation", which is visible by fuzzy matches (which I assume is also a feature coming from gettext again, really).

So in essence, the .po files themselves are that TM already. There is no additional or separate component, but as the entire "document" is identical to all (successful) translations that have been done in the past, there is not even a need for a separate TM. It's all read into "Rosetta's memory" in its entirety. There is no disadvantage of having "no TM", given we only deal with our domain specific vocabulary.

Is this view correct?

External TMs?

A related question, after having clarified whether Rosetta has a TM or no, is there a way to

download Rosetta's TM and/or
attach (or upload) an external TM

to add, say, more flexibility to the translation process?

mbi commented 3 years ago

Hi Peter!

From my understanding, Rosetta is more or less a nice front-end to manipulate .po files, extracted by Python's gettext module integrated in Django.

That's correct: Rosetta's main task is offering a user-friendly interface to interact (read/write) .po catalogs produced by Django's makemessages and compiling them into . mo files that Django then reads at runtime.

So in essence, the .po files themselves are that TM already.

Yes, but as every "project" only manages its own gettext catalog, if you manage different projects, then you cannot access translations you've already provided in other projects.

From my limited understanding of what a TM is, such a tool should maintain a database of all the corpus a translator has ever produced, so that when a new string needs to be translated, the tool will provide suggestions based on some possibly fuzzy match on the database.

So if that's what you're expecting, then no, Rosetta doesn't provide that kind of feature at the moment, because again: the only datasource is the po catalog itself.

There are no models, yet still Rosetta does "automatic translation"

This is provided through a series on interfaces to online translation services, such as Google Translate, Bing Translate, Yandex translate and such.

But a professional translator will probably frown upon these services and rather prefer their own TM corpus. 🤷🏼‍♂️

A related question, after having clarified whether Rosetta has a TM or no, is there a way to download Rosetta's TM and/or

No, you can download the PO catalog for the current project, but that's it.

attach (or upload) an external TM.

Ah well, now: if any such thing exists and is well documented (if there is a catalog to upload and / or an API to query) then I don't see why that wouldn't be doable.

Hope this helps, further discussion and PRs welcome 😉

mondeja commented 3 years ago

Rosetta is simply a Django app that processes pofiles and compiles their correspondent mofiles. Rosetta does not imposes any way of pre or post-process your files. You could emulate a translation memory using a pofile compendium. But keep in mind that this process of discovering new fuzzy matches is not managed by Rosetta, but by the scripts written by the developer of the project.

I understand the lack of use of Rosetta in the translation industry, because, for example, if you need to go back for translations removed from the files, these will not be found in a separate database.

If I'm not wrong, you are asking for a pofile compendium that could be added as another pofile of the project and a button (or whatever other system) that could discover new matches, then another button that could download pofiles in different formats. Is this correct?

bittner commented 3 years ago

Alright, so a PO file compendium, which is a concept of GNU gettext, corresponds to what other translation tools maintain as a TM?

you are asking for a pofile compendium that could be added as another pofile of the project and a button (or whatever other system) that could discover new matches, then another button that could download pofiles in different formats.

Exactly. Basically, I want to satisfy the expectations of translation agencies. They can

attach a TM to (or create a TM with) a translation project
download the TM created or updated by the translation project

According to the Transifex docs downloading a TMX is possible. I wouldn't be surprised if that was actually a PO file compendium converted to XML. (You need a paid plan to do this, for what I can see on Transifex.)

For what regards Rosetta, in theory, the simplest approach (as a concept) might be

to use the existing translations from all INSTALLED_APPS in a Django-based project
and combine them to a PO file compendium.

That compendium could then be used to allow for automatic pre-translation or assisted translation (suggestions). It would be an automatic, fully integrated TM that doesn't need any separate management effort by the user. Allowing to download a TMX could be an optional feature.

Would that be realistic?

mbi commented 3 years ago

Having a compendium is only the first step. We'd also need an intelligent way of matching past translations from the compendium and produce fuzzy suggestions in the PO catalog being translated.

bittner commented 3 years ago

True. If we had that, though, we could address one side of the criticism already: "It doesn't have a TM" would cease to be true. And converting a PO file compendium to TMX seems to be a thing that is already addressed by free projects. – Just saying.

mbi commented 3 years ago

The Translation Toolkit seems to be a very good candidate to manage, import, export and possibly search TMX documents in Python.

mbi / django-rosetta

Rosetta and TMs (translation memory) #249

Where is Rosetta's TM?

External TMs?