Closed slerman12 closed 2 years ago
Haha, no it doesn't. It divides the original batch into smaller batches and accumulates gradients accordingly, so it should effectively be the same as the computing the original. There seems to be a bug #16, however, I'm a little busy this week to really look into what happened.
As for devices, currently it only works on a single GPU (as a proof of concept), I'm still figuring out a way for it to work across multiple GPUs.
Just making sure, this lazy wrapper somehow divvies up the computations per GPU budget, right? it doesn't just... sub-sample a smaller batch and ignore the remainder, right?