Closed eisenjulian closed 5 years ago
Awesome. I think that's enough for the paper. Just out of curiosity, would it be hard to get those numbers for the multilingual BERT?
It's a little bit more complex to make a fair comparison. I can run in on a P100 (16GB) while it didn't run on 12GB).
The numbers of tokens in Bert Multilingual is 110k and we obtain an average per batch time of 1582ms using a batch size of 32 as well. This is for the classification task.
Here's the results I'm getting, we can control for other variables if we want https://github.com/n-waves/ulmfit-multilingual/blob/qrnn_perf/results/time_benchmark/logs.md