leondgarse / Keras_insightface

Insightface Keras implementation
MIT License
230 stars 56 forks source link

Question About batch size in distributed training #119

Closed PR451 closed 1 year ago

PR451 commented 1 year ago

I am using 7 gpus for training and keeping batch size as 256. But in logs I am seeing following line

L2 regularizer value from basic_model: 0 num_replicas_in_sync: 7, batch_size: 6272 Init type by loss function name... Train arcface... Init softmax dataset...

How is batch_size size calculated here?

Also, from my understanding lr should be multiplied by no of gpus. Do you have any suggestions on lr?

leondgarse commented 1 year ago
PR451 commented 1 year ago

Thanks a lot. I was manually multiplying the batch_size * strategy.num_replicas_in_sync to increase the batch size.