tarrade / proj_multilingual_text_classification

Explore multilingal text classification using embedding, bert and deep learning architecture
Apache License 2.0
5 stars 2 forks source link

compute the multilingual BERT model (512 sequence length) with a GPU and compare computation time and costs to the CPU run #29

Closed vluechinger closed 4 years ago

vluechinger commented 4 years ago

Data: IMDb dataset: 25'000 for training, 1'000 for validation Batch sizes: 32 for training, 64 for validation Sequence length: 512

Training specifications: Training steps: 782, epochs: 1, validation steps: 16, history per steps: 50 Optimizer: Adam, learning rate=3e-5, epsilon=1e-8

CPU setup: With 48 CPUs (65-96% utilization) and 60GB Memory (35-40GB utilization) ~6.5 hours, ~€34

GPU setup: Scale tier: BASIC_GPU which is a n1-standard-8 with one k80 GPU ~1 hour, 1.63 ML units which is approximately €1.35

Conclusion: much faster and much cheaper. When training on the cloud, prefer GPU's but be vigilant when using it in a notebook (hidden running costs when forgetting to shut it down).