Data:
IMDb dataset: 25'000 for training, 1'000 for validation
Batch sizes: 32 for training, 64 for validation
Sequence length: 512
Training specifications:
Training steps: 782, epochs: 1, validation steps: 16, history per steps: 50
Optimizer: Adam, learning rate=3e-5, epsilon=1e-8
CPU setup:
With 48 CPUs (65-96% utilization) and 60GB Memory (35-40GB utilization)
~6.5 hours, ~€34
GPU setup:
Scale tier: BASIC_GPU which is a n1-standard-8 with one k80 GPU
~1 hour, 1.63 ML units which is approximately €1.35
Conclusion: much faster and much cheaper. When training on the cloud, prefer GPU's but be vigilant when using it in a notebook (hidden running costs when forgetting to shut it down).
Data: IMDb dataset: 25'000 for training, 1'000 for validation Batch sizes: 32 for training, 64 for validation Sequence length: 512
Training specifications: Training steps: 782, epochs: 1, validation steps: 16, history per steps: 50 Optimizer: Adam, learning rate=3e-5, epsilon=1e-8
CPU setup: With 48 CPUs (65-96% utilization) and 60GB Memory (35-40GB utilization) ~6.5 hours, ~€34
GPU setup: Scale tier: BASIC_GPU which is a n1-standard-8 with one k80 GPU ~1 hour, 1.63 ML units which is approximately €1.35
Conclusion: much faster and much cheaper. When training on the cloud, prefer GPU's but be vigilant when using it in a notebook (hidden running costs when forgetting to shut it down).