Closed InfiniteLife closed 5 years ago
Is gradient aggregation required for small number of GPUs? In my experience, bs=8 or bs=16 makes no difference with linear lr.
On Tue, Sep 10, 2019 at 5:19 PM InfiniteLife notifications@github.com wrote:
For example we have 2 GPU for training and each fits batch of 2 making effective batch size 4. If there is gradient batch aggregation we could process several batches on each GPU and aggregate gradients by that increasing effective batch size. Is there possibility to use such feature?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TuSimple/simpledet/issues/230?email_source=notifications&email_token=ABGODH666FQWXSMCSBUUUQDQI5RBFA5CNFSM4IVFDTOKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HKMPNXA, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGODHYHCBVNDPJIL7YJ3OTQI5RBFANCNFSM4IVFDTOA .
For example we have 2 GPU for training and each fits batch of 2 making effective batch size 4. If there is gradient batch aggregation we could process several batches on each GPU and aggregate gradients by that increasing effective batch size. Is there possibility to use such feature?