compute the multilingual BERT model (512 sequence length) with a GPU and compare computation time and costs to the CPU run

Data: IMDb dataset: 25'000 for training, 1'000 for validation Batch sizes: 32 for training, 64 for validation Sequence length: 512

Training specifications: Training steps: 782, epochs: 1, validation steps: 16, history per steps: 50 Optimizer: Adam, learning rate=3e-5, epsilon=1e-8

CPU setup: With 48 CPUs (65-96% utilization) and 60GB Memory (35-40GB utilization) ~6.5 hours, ~€34

GPU setup: Scale tier: BASIC_GPU which is a n1-standard-8 with one k80 GPU ~1 hour, 1.63 ML units which is approximately €1.35

Conclusion: much faster and much cheaper. When training on the cloud, prefer GPU's but be vigilant when using it in a notebook (hidden running costs when forgetting to shut it down).

tarrade / proj_multilingual_text_classification

compute the multilingual BERT model (512 sequence length) with a GPU and compare computation time and costs to the CPU run #29