I am using 7 gpus for training and keeping batch size as 256. But in logs I am seeing following line
L2 regularizer value from basic_model: 0
num_replicas_in_sync: 7, batch_size: 6272
Init type by loss function name...
Train arcface...
Init softmax dataset...
How is batch_size size calculated here?
Also, from my understanding lr should be multiplied by no of gpus. Do you have any suggestions on lr?
It's calculated and printed on line train.py#L116-L120, just batch_size * strategy.num_replicas_in_sync without any other modification, it shouldn't be wrong.
Ya, basically lr should be modified according to batch_size, thus in distributed training, should be multiplied by no of gpus. My experience in distributed training is very limited, and just 2 GPUs...
I am using 7 gpus for training and keeping batch size as 256. But in logs I am seeing following line
How is batch_size size calculated here?
Also, from my understanding lr should be multiplied by no of gpus. Do you have any suggestions on lr?