haoyu94 / RoITr

Rotation-Invariant Transformer for Point Cloud Matching
MIT License
108 stars 13 forks source link

Batch size for training #4

Closed Parskatt closed 11 months ago

Parskatt commented 1 year ago

Am I interpreting correctly that the effective batch size is 4, since it seems you used 4 gpus with batchsize 1 each?

haoyu94 commented 1 year ago

yes the batch size is fixed as 1 and 4 gpu is used for training

Parskatt commented 12 months ago

And the learning rate is not multiplied with the number of GPUs, i.e., you use 1e-4, not NUM_GPUS * 1e-4. So if I have one GPU, and batchsize of 1, I should use 1e-4/4 ?

I know the scaling laws for learning rate are not so certain for tiny batchsize, but still.

haoyu94 commented 12 months ago

I think you can still try to start from 1e-4. It is a good initial learning rate for many tasks.

As you said, I also think probably the law is not applicable for such a tiny batch size.

Parskatt commented 11 months ago

From some experimenting, I think dividing by 4 is the correct thing to do.