Open Okohedeki opened 2 years ago
That pretty much depends on what device and which checkpoint you use. The smallest checkpoint has 600M parameters, actually it's already quite large compared to some commonly used pretrained models. So inference with such a large model is expected to take some time.
Is there some way for example to reduce the time by keeping the model loaded or idk some other way @pluiez? I noticed there is no docker version and probably i could help, but i would like to know how can you keep it preloaded so it translates in a matter of 1-2 secs rather then 30+ seconds it currently takes.
If you wanted to preload it you could just host it on a server and call it from there. But due to the size a lot of cloud providers won’t let you use the largest version. I just downgraded to the msm100 until meta optimizes it
i am calling it locally from the server but the problem is that for each translation i have to call translate.sh which reloads to the memory the model every time
Hello,
I was curious how long it took for your prediction to run? It took a couple of seconds for me so I was wondering if that was just due the the NLLB model or if you experienced something different, which would lead me to believe my set up is messed up somewhere.