Closed LWprogramming closed 11 months ago
Encountered this in a different training run where I was trying audio of longer lengths and ended up OOMing on large batch sizes, so I increased grad_accum_every to 32 and ended up with this graph:
Compare to after including this change:
accept! 💯
Encountered this in a different training run where I was trying audio of longer lengths and ended up OOMing on large batch sizes, so I increased grad_accum_every to 32 and ended up with this graph:
Compare to after including this change: