ProxylessTrainer missing gradients when first layer is nn.LayerChoice

AL3708 commented 2 years ago

If first layer inside model is nn.LayerChoice, then RuntimeError: One of the differentiated Tensors does not require grad is thrown.

Ex model:

class Net(nn.Module):
    def __init__(self):
        super().__init__()
        self.stem_conv = nn.LayerChoice([
            # Conv-bn-act
            ConvBlock(in_channels=3, out_channels=32, kernel_size=3, stride=2),
            ConvBlock(in_channels=3, out_channels=32, kernel_size=5, stride=2),
        ])

        self.stage_1 = Stage(channels=32,)
        # ...

    def forward(self, x: Tensor):
        x = self.stem_conv(x)
        x = self.stage_1(x)
        # ...
        return x

Exception:

Traceback (most recent call last):
  File "C:\Users\...\proxylessnas.py", line 374, in <module>
    main()
  File "C:\Users\...\proxylessnas.py", line 360, in main
    trainer.fit()
  File "C:\Users\...\lib\site-packages\nni\retiarii\oneshot\pytorch\proxyless.py", line 363, in fit
    self._train_one_epoch(i)
  File "C:\Users\...\proxylessnas.py", line 297, in _train_one_epoch
    loss.backward()
  File "C:\Users\...\lib\site-packages\torch\_tensor.py", line 396, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)
  File "C:\Users\...\lib\site-packages\torch\autograd\__init__.py", line 173, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "C:\Users\...\lib\site-packages\torch\autograd\function.py", line 253, in apply
    return user_fn(self, *args)
  File "C:\Users\...\lib\site-packages\nni\retiarii\oneshot\pytorch\proxyless.py", line 37, in backward
    grad_x = torch.autograd.grad(output, detached_x, grad_output, only_inputs=True)
  File "C:\Users\...\lib\site-packages\torch\autograd\__init__.py", line 276, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: One of the differentiated Tensors does not require grad

If I switch to standard convolution inside stem:

self.stem_conv = ConvBlock(in_channels=3, out_channels=32, kernel_size=3, stride=2)

It works fine.

Environment:

NNI version:2.8
Training service (local|remote|pai|aml|etc): local
Client OS: Windows 10
Python version: 3.10
PyTorchversion: 1.12
Is conda/virtualenv/venv used?: Pipenv
Is running in Docker?: No

ultmaster commented 2 years ago

I think it has been fixed in latest implementation of Proxyless. You can try our alpha prerelease of v2.9.

pip install --extra-index-url https://test.pypi.org/simple/ nni==2.9a1

ultmaster commented 2 years ago

Closing as the problem seems to have been resolved in v2.9.

Please reopen if there are further inquiries.

microsoft / nni

ProxylessTrainer missing gradients when first layer is nn.LayerChoice #5080