Investigate CTranslate2 for translating sentences with teacher model

marco-c commented 1 year ago

This should speed up model training.

eu9ene commented 1 year ago

I would say it can speed up decoding since the library is optimized for inference. The GPU benchmarks for float16 are interesting. It's 2x+ faster than Marian with only a slight decrease in BLEU. We do use half-precision decoding, but on 4-8 GPUs per machine.

marco-c commented 11 months ago

How long are we currently spending on translations in the pipeline?

marco-c commented 10 months ago

Translation is split across several tasks. For Hungarian, the total time spent on it was around 443 hours. If we can speed it up by 2x, it'll be a nice cost saving (and it'll also mean the pipeline will go faster, even though we could achieve the same by splitting across even more tasks).

marco-c commented 10 months ago

The command to convert a Marian model to CTranslate2 is ct2-marian-converter (ct2-opus-mt-converter is also interesting): https://opennmt.net/CTranslate2/guides/marian.html. As input, we need the model.npz file (we can use a teacher one from a train-teacher or finetune-teacher task, or a student one from a train-student or finetune-student task, or a quantized one from a quantize task), and vocabulary files (we have spm vocab, but not the format CTranslate2 is expecting).

To generate the vocab, we can use the marian-vocab command. Perhaps we can add marian-vocab execution to the train-vocab task, so we have the necessary vocab files readily available.

marco-c commented 10 months ago

For now, we can only convert the teacher model. The student model is using a RNN-based decoder, which isn't supported by CTranslate2 yet.

I managed to convert our teacher model (the Hungarian one from our latest en->hu run) by changing the load_vocab function in https://github.com/OpenNMT/CTranslate2/blob/c6f7f3bcc61964ca787cadf796e237fa0025f483/python/ctranslate2/converters/marian.py#L118 to:

def load_vocab(path):
    import sentencepiece as spm
    sp = spm.SentencePieceProcessor(path)

    return [sp.id_to_piece(i) for i in range(sp.vocab_size())]

(NOTE: presumably we can convert the vocab.spm to a yaml file and load it without having to patch CTranslate2)

Then running: ct2-marian-converter --model_path model.npz.best-chrf.npz --vocab_paths vocab.spm --output_dir ct2_teacher_model.

Then, to run the model:

import ctranslate2
import sentencepiece as spm

translator = ctranslate2.Translator("ct2_teacher_model", device="cpu")
sp = spm.SentencePieceProcessor("vocab.spm")

input_text = "Hello, world!"
input_tokens = sp.encode(input_text, out_type=str)

results = translator.translate_batch([input_tokens])

output_tokens = results[0].hypotheses[0]
print(sp.decode(output_tokens))

gregtatum commented 2 months ago

CTranslate2 does not support ensemble translating out of the box. I'm investigating further.

https://github.com/OpenNMT/CTranslate2/issues/273

gregtatum commented 2 months ago

So it looks like Marian is doing something that CTranslate2 doesn't support. I believe we'd have to fork and add support ourselves to CTranslate2 to use ensembles.

Boiled down in Marian it does:

// The log probability scores for each word in the vocab for the next token prediction.
Expr stepScores;
// Here scores_ is our vector of teacher models.
for(size_t i = 0; i < scorers_.size(); ++i) {
  // This is the log probability of the logits.
  Expr logProbs;
  // Compute log probabilities using the current scorer
  logProbs = states[i]->getLogProbs().getLogits();
  // Combine the scores from all scorers
  if(i == 0)
    // The first model sets the stepScores, and applies its weight. In our case the
    // weight is 1.0.
    stepScores = scorers_[i]->getWeight() * logProbs;
  else
    // Each successive model adds its log probabilities to the step to create a
    // combined prediction. CTranslate2 can't do this unless the source code is modified.
    stepScores = stepScores + scorers_[i]->getWeight() * logProbs;
}

Source: https://github.com/marian-nmt/marian-dev/blob/master/src/translator/beam_search.cpp#L456

gregtatum commented 2 months ago

We could also investigate removing teacher ensemble: #778

That would unblock us here.

gregtatum commented 2 months ago

If we test a single model for CTranslate2, we should also test the quality hit on using a quantized model for additional performance gains.

mozilla / firefox-translations-training

Investigate CTranslate2 for translating sentences with teacher model #165