Does gradient accumulation need to be hand-written?

transmissions11 / bistro

Opinionated GPT implementation and finetuning harness.

Apache License 2.0

5 stars 0 forks source link

Does gradient accumulation need to be hand-written? #16

Closed charlesfrye closed 11 months ago

charlesfrye commented 1 year ago

Surprised to see this logic here, rather than using the accumulate grads flag of the Trainer.

https://github.com/transmissions11/bistro/blob/dcec8e4233e0203f6d65a63d9ca0105e9c22eacd/lit_model.py#L95-L96C11

charlesfrye commented 1 year ago

Notes from sync discussion: because hard prompt updating isn't just adding to params, doesn't fit super well with Lightning's optimization abstractions.