n-waves / multifit

The code to reproduce results from paper "MultiFiT: Efficient Multi-lingual Language Model Fine-tuning" https://arxiv.org/abs/1909.04761
MIT License
282 stars 56 forks source link

QRNN Time Performance Benchark #37

Closed eisenjulian closed 5 years ago

eisenjulian commented 5 years ago

Here's the results I'm getting, we can control for other variables if we want https://github.com/n-waves/ulmfit-multilingual/blob/qrnn_perf/results/time_benchmark/logs.md

sebastianruder commented 5 years ago

Awesome. I think that's enough for the paper. Just out of curiosity, would it be hard to get those numbers for the multilingual BERT?

eisenjulian commented 5 years ago

It's a little bit more complex to make a fair comparison. I can run in on a P100 (16GB) while it didn't run on 12GB).

The numbers of tokens in Bert Multilingual is 110k and we obtain an average per batch time of 1582ms using a batch size of 32 as well. This is for the classification task.