Open Balandat opened 5 years ago
cc @izmailovpavel
Hi @Balandat, looking into this right now. I believe the issue is that L-BFGS optimizer requires all the parameters to be on the same GPU (see second warning here: https://pytorch.org/docs/stable/optim.html#torch.optim.LBFGS). I have tried replacing the SWA
wrapper with the following simple wrapper that does nothing but mimics the SWA
interface, and it fails on the same test:
class LBFGSWrapper:
class LBFGSWrapper:
def __init__(self, lbfgs):
self.optimizer = lbfgs
def step(self, *args):
return self.optimizer.step(*args)
def zero_grad(self, *args):
return self.optimizer.zero_grad(*args)
def swap_swa_sgd(self):
pass
def update_swa(self):
pass
def state_dict(self):
return self.optimizer.state_dict()
def load_state_dict(self, *args):
return self.optimizer.load_state_dict(*args)
To fix this for now we can by replace the lines https://github.com/pytorch/contrib/blob/master/test/test_swa.py#L312-L313 with
ignore_multidevice = constructor == lbfgs_constructor
self._test_basic_cases(
lambda weight, bias: constructor([weight, bias]),
ignore_multidevice=ignore_multidevice)
running
test_swa.py
on a device with multiple GPUs results in the following: