Integrate a segment-level quality score

bittlingmayer commented 3 years ago

Machine translation is integrated, but of course some of the translations are bad.

With an instant segment-level quality score, Oratio can implement various flavours of "hybrid" translation:

create a priority queue for human post-editors
send segments below a certain quality to human post-editors as well as monitor aggregate quality and more system- or process-level issues (bad segmentation, wrong language, issues with content types...).

I'd suggest the ModelFront API.

Full-disclosure: I'm a co-founder of ModelFront.

bittlingmayer commented 3 years ago

This is also relevant: https://slator.com/machine-translation/good-bad-or-loose-amazons-new-subtitle-quality-estimation-system/

kpister commented 3 years ago

That seems pretty neat. Does the API include evaluation?

How do you run that process? It seems like Google/Amazon etc would be training on all the available data which makes building an evaluator difficult.

bittlingmayer commented 3 years ago

In the interest of openness:

No, currently the ModelFront API works like a machine translation API. It can take a batch of a few dozen sentence pairs, and returns a risk score for each. So you should loop through and "stream" them up (from up to a few a threads in parallel).

(For evaluating a document or corpus and getting aggregate metrics, you should use the console. You can point it to a GCS URL. Support for asynchronously evaluating large files via the API is on the roadmap, but not soon.)

It seems like Google/Amazon etc would be training on all the available data which makes building an evaluator difficult.

Contamination and correlation are definitely risks.

The generic ModelFront system is trained mostly on data that Google and Amazon do not have access to or for which, for various reasons, their translation engines fail on anyway. Most of our training data is curated in-house or generated synthetically.

For example, until recently, Google Translate was translating Iran into Italian as Ho corso (that as, if the source were I ran). Of course, the correct translation occurs in their datasets millions of times.

In the case of custom models, it's trained on private client datasets, usually from professional human post-editors. Those datasets also have problems, and we're actively working on ways to deal with those automatically.

kpister commented 3 years ago

Just added modelfront to the project -- now you can evaluate a translation as part of the client call client.get_translation("Test", original_language="en", target_language="es", with_eval=True).

I'm really excited to see where these results lead.

kpister / oratio

Integrate a segment-level quality score #16