Question about distillers

LGYoung commented 4 years ago

Hello, the following is the "once-for-all" training stratage you mentioned in ur paper: "At each training step, we randomly sample a sub-network with a certain channel number configuration, compute the output and gradients, and update the extracted weights using our learning objective (Equation 4)" where can I find this stratage in your codes?

lmxyy commented 4 years ago

We implement a pytorch module for "once-for-all" network in super_mobile_resnet_generator.py. For the "once-for-all" training strategy, you could refer to supernets/resnet_supernet.py.

def optimize_parameters(self):
    self.optimizer_D.zero_grad()
    self.optimizer_G.zero_grad()
    config = self.configs.sample()
    self.forward(config=config)
    util.set_requires_grad(self.netD, True)
    self.backward_D()
    util.set_requires_grad(self.netD, False)
    self.backward_G()
    self.optimizer_D.step()
    self.optimizer_G.step()

When we optimize the parameters, we will sample a configuration in the configuration set self.configs and forward the network with this configuration.

LGYoung commented 4 years ago

Thx for quick reply!

mit-han-lab / gan-compression

Question about distillers #6