Open eu9ene opened 4 months ago
It appears to be even lower for translate-corpus: GCP console
@gregtatum FYI
Is it possible to dynamically determine this value? Like run N translations, measure and adjust?
Noticed also this and it's been the same always. I think the bottleneck is decoding. Doing n-best with beam 8 it seems to make much less use of GPU than not doing n-best and about 6-4 beam.
This won't increment the use of GPU, but I've been using --fp16
during inference and training without any significant quality drop. Haven't compared n-best generation though.
Another alternative would be comparing with ctranslate2, that has faster inference than marian.
Related to #165
Training uses dynamic batch sizes, so it changes the batch size over time to find the best value, so there's not really a need to adjust it. It starts somewhat inefficient, but quickly dials in the number to be as efficient as it can.
Translate tasks however are not dynamic for batching size. I played with the them in #931 and got it optimized to be about as efficient as training by adjust the batching behavior. I think this 70% is just the cap for Marian's ability to utilize the GPUs. CTranslate2 was able to get ~96% utilization and was much faster given the same beam size.
It'll take a bit more time to get COMET scores for using CTranslate2 to cross-compare. CTranslate2 doesn't support ensemble decoding, so we'll have to compare with Marian single teacher decoding.
Currently, it's ~70%. We could try using a bigger batch but it also depends on language.
GCP console for translate-mono task)