tusen-ai / simpledet

A Simple and Versatile Framework for Object Detection and Instance Recognition
Apache License 2.0
3.08k stars 488 forks source link

Is there possibility to use gradient batch aggregation for big models with small amount of GPUs? #230

Closed InfiniteLife closed 5 years ago

InfiniteLife commented 5 years ago

For example we have 2 GPU for training and each fits batch of 2 making effective batch size 4. If there is gradient batch aggregation we could process several batches on each GPU and aggregate gradients by that increasing effective batch size. Is there possibility to use such feature?

RogerChern commented 5 years ago

Is gradient aggregation required for small number of GPUs? In my experience, bs=8 or bs=16 makes no difference with linear lr.

On Tue, Sep 10, 2019 at 5:19 PM InfiniteLife notifications@github.com wrote:

For example we have 2 GPU for training and each fits batch of 2 making effective batch size 4. If there is gradient batch aggregation we could process several batches on each GPU and aggregate gradients by that increasing effective batch size. Is there possibility to use such feature?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/TuSimple/simpledet/issues/230?email_source=notifications&email_token=ABGODH666FQWXSMCSBUUUQDQI5RBFA5CNFSM4IVFDTOKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HKMPNXA, or mute the thread https://github.com/notifications/unsubscribe-auth/ABGODHYHCBVNDPJIL7YJ3OTQI5RBFANCNFSM4IVFDTOA .