Open nahidalam opened 7 months ago
Our experiments used 8xA100 80G. You may consider setting the parameter --model.finetune_per_device_batch_size lower without changing the batch size to train on GPUs with lower memory.
If using 8x A100 40G, how should the batch size be set for optimal performance? @h-zhao1997
How much GPU I need for training at a minimum? 8xA100 (40GB) or 8xA100(80GB)?