microsoft / nni

An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
https://nni.readthedocs.io
MIT License
14.05k stars 1.82k forks source link

proxyless example can't train because the grad is none #5394

Closed miaott1234 closed 1 year ago

miaott1234 commented 1 year ago

Describe the issue:

proxyless example can't train because the grad is none name: module.blocks.1.mobile_inverted_conv.ops.0.inverted_bottleneck.conv.weight -->grad_requirs: True --weight tensor(0.0037, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.inverted_bottleneck.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.inverted_bottleneck.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.depth_conv.conv.weight -->grad_requirs: True --weight tensor(-0.0017, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.depth_conv.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.depth_conv.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.point_linear.conv.weight -->grad_requirs: True --weight tensor(-0.0035, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.point_linear.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.0.point_linear.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.inverted_bottleneck.conv.weight -->grad_requirs: True --weight tensor(0.0092, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.inverted_bottleneck.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.inverted_bottleneck.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.depth_conv.conv.weight -->grad_requirs: True --weight tensor(-0.0001, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.depth_conv.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.depth_conv.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.point_linear.conv.weight -->grad_requirs: True --weight tensor(0.0023, device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.point_linear.bn.weight -->grad_requirs: True --weight tensor(1., device='cuda:0') -->grad_value: None -->name: module.blocks.1.mobile_inverted_conv.ops.1.point_linear.bn.bias -->grad_requirs: True --weight tensor(0., device='cuda:0') -->grad_value: None

the layer build by LayerChoice has None grad_value, but the common layer such as first layer has grad.

Please check it Thx

Environment:

matluster commented 1 year ago

The legacy implementation of ProxylessNAS is known to be buggy for gradient problems like this. Fortunately, a large portion of these problems have been fixed in the latest implementation. But unluckily the latest version hasn't been released yet. Please hang tight for a while...

miaott1234 commented 1 year ago

As far as I know, the official implementation of ProxylessNAS does not have gradient problems.

The legacy implementation of ProxylessNAS is known to be buggy for gradient problems like this. Fortunately, a large portion of these problems have been fixed in the latest implementation. But unluckily the latest version hasn't been released yet. Please hang tight for a while...

matluster commented 1 year ago

The official implementation of ProxylessNAS is also known to suffer from other problems. For example,

miaott1234 commented 1 year ago

Thank you. I see. I am very much looking forward to the release of the latest version, is it possible to get it before April?

matluster commented 1 year ago

I'm not sure about whether we can get a stable release before April.

If you are interested in a preview, you can find it here (a working ProxylessNAS example): https://github.com/ultmaster/nni/blob/nas-nn-refactor/examples/nas/hub/proxyless_search.py

miaott1234 commented 1 year ago

That's great Thank you very much