proxyless example can't train because the grad is none

miaott1234 commented 1 year ago

Describe the issue:

proxyless example can't train because the grad is none name: module.blocks.1.mobile_inverted_conv.ops.0.inverted_bottleneck.conv.weight -->grad_requirs: True --weight tensor(0.0037, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.inverted_bottleneck.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.inverted_bottleneck.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.depth_conv.conv.weight -->grad_requirs: True --weight tensor(-0.0017, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.depth_conv.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.depth_conv.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.point_linear.conv.weight -->grad_requirs: True --weight tensor(-0.0035, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.point_linear.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.point_linear.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.inverted_bottleneck.conv.weight -->grad_requirs: True --weight tensor(0.0092, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.inverted_bottleneck.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.inverted_bottleneck.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.depth_conv.conv.weight -->grad_requirs: True --weight tensor(-0.0001, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.depth_conv.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.depth_conv.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.point_linear.conv.weight -->grad_requirs: True --weight tensor(0.0023, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.point_linear.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.point_linear.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None

the layer build by LayerChoice has None grad_value, but the common layer such as first layer has grad.

Please check it Thx

Environment:

NNI version: 2.7
Training service (local|remote|pai|aml|etc): local
Client OS: ubantu
Server OS (for remote mode only):ubantu
Python version: 3.8
PyTorch/TensorFlow version:PyTorch
Is conda/virtualenv/venv used?:conda
Is running in Docker?: no

matluster commented 1 year ago

The legacy implementation of ProxylessNAS is known to be buggy for gradient problems like this. Fortunately, a large portion of these problems have been fixed in the latest implementation. But unluckily the latest version hasn't been released yet. Please hang tight for a while...

miaott1234 commented 1 year ago

As far as I know, the official implementation of ProxylessNAS does not have gradient problems.

The legacy implementation of ProxylessNAS is known to be buggy for gradient problems like this. Fortunately, a large portion of these problems have been fixed in the latest implementation. But unluckily the latest version hasn't been released yet. Please hang tight for a while...

matluster commented 1 year ago

The official implementation of ProxylessNAS is also known to suffer from other problems. For example,

No support for multiple positional arguments.
No support for nested LayerChoice.
No support for InputChoice.

miaott1234 commented 1 year ago

Thank you. I see. I am very much looking forward to the release of the latest version, is it possible to get it before April?

matluster commented 1 year ago

I'm not sure about whether we can get a stable release before April.

If you are interested in a preview, you can find it here (a working ProxylessNAS example): https://github.com/ultmaster/nni/blob/nas-nn-refactor/examples/nas/hub/proxyless_search.py

miaott1234 commented 1 year ago

That's great Thank you very much

microsoft / nni

proxyless example can't train because the grad is none #5394