An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.
Describe the bug:
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().
when I use NNI to prune my customed transformer ,it first looks great with these
[2024-03-08 11:11:12] Update indirect mask for call_function: truediv,
[2024-03-08 11:11:12] Update indirect mask for call_function: sqrt,
[2024-03-08 11:11:12] Update indirect mask for call_function: getitem_13,
[2024-03-08 11:11:12] Update indirect mask for call_function: getattr_3,
[2024-03-08 11:11:12] Update indirect mask for call_method: transpose_2, output mask: 0.0000
[2024-03-08 11:11:12] Update indirect mask for call_method: view_2, output mask: 0.0000
[2024-03-08 11:11:12] Update indirect mask for call_module: encoder_encoder_layers_0_attention_value_projection, weight: 0.0000 bias: 0.0000 , output mask: 0.0000
until it throw an issue
Traceback (most recent call last):
File "F:\研究生学习文件\研二\时序预测算法\transformer\pythonProject2\0305\4.py", line 219, in
ModelSpeedup(model, dummy_input, masks).speedup_model()
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\nni\compression\speedup\model_speedup.py", line 435, in speedup_model
self.update_indirect_sparsity()
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\nni\compression\speedup\model_speedup.py", line 306, in update_indirect_sparsity
self.node_infos[node].mask_updater.indirect_update_process(self, node)
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\nni\compression\speedup\mask_updater.py", line 160, in indirect_update_process
output = getattr(model_speedup, node.op)(node.target, args_cloned, kwargs_cloned)
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\torch\fx\interpreter.py", line 289, in call_method
return getattr(self_obj, target)(*args_tail, **kwargs)
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().
from nni.compression.pruning import MovementPruner
from nni.compression.speedup import ModelSpeedup
from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer
Describe the bug: RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach(). when I use NNI to prune my customed transformer ,it first looks great with these [2024-03-08 11:11:12] Update indirect mask for call_function: truediv, [2024-03-08 11:11:12] Update indirect mask for call_function: sqrt, [2024-03-08 11:11:12] Update indirect mask for call_function: getitem_13, [2024-03-08 11:11:12] Update indirect mask for call_function: getattr_3, [2024-03-08 11:11:12] Update indirect mask for call_method: transpose_2, output mask: 0.0000 [2024-03-08 11:11:12] Update indirect mask for call_method: view_2, output mask: 0.0000 [2024-03-08 11:11:12] Update indirect mask for call_module: encoder_encoder_layers_0_attention_value_projection, weight: 0.0000 bias: 0.0000 , output mask: 0.0000 until it throw an issue Traceback (most recent call last): File "F:\研究生学习文件\研二\时序预测算法\transformer\pythonProject2\0305\4.py", line 219, in
ModelSpeedup(model, dummy_input, masks).speedup_model()
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\nni\compression\speedup\model_speedup.py", line 435, in speedup_model
self.update_indirect_sparsity()
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\nni\compression\speedup\model_speedup.py", line 306, in update_indirect_sparsity
self.node_infos[node].mask_updater.indirect_update_process(self, node)
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\nni\compression\speedup\mask_updater.py", line 160, in indirect_update_process
output = getattr(model_speedup, node.op)(node.target, args_cloned, kwargs_cloned)
File "E:\ANACONDA\Anaconda\envs\torch\lib\site-packages\torch\fx\interpreter.py", line 289, in call_method
return getattr(self_obj, target)(*args_tail, **kwargs)
RuntimeError: you can only change requires_grad flags of leaf variables. If you want to use a computed variable in a subgraph that doesn't require differentiation use var_no_grad = var.detach().
我在使用NNI对自己定义的transformer模型进行剪枝的时候报出这个错误,我尝试用L1NormPruner和MovementPruner进行,并且参考了NNI官方对transformer模型的剪枝案例(没有使用案例中的知识蒸馏),都尝试无果,会在speedup的过程中报出以上错误,我无法判断是我对NNI的设置有问题还是我自己定义的transformer模型不符合NNI的标准,故而寻求帮助
Environment:
Reproduce the problem
模型剪枝
from nni.compression.pruning import MovementPruner from nni.compression.speedup import ModelSpeedup from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer
print(model) config_list = [{ 'op_types': ['Linear'], 'op_names_re': ['encoder.encoder_layers.0.attention.*'], 'sparse_threshold': 0.1, 'granularity': [4, 4] }] pruner = MovementPruner(model, config_list, evaluator, warmup_step=10, cooldown_begin_step=20, regular_scale=20) pruner.compress(40, 4) print(model) pruner.unwrap_model() masks = pruner.get_masks() dummy_input = (torch.randint(0, 1, (32, 16, 1)).to(device).float(), torch.randint(0, 1, (32, 16, 1)).to(device).float())
replacer = TransformersAttentionReplacer(model)
ModelSpeedup(model, dummy_input, masks).speedup_model()
CustomTransformer( (embedding): Linear(in_features=1, out_features=64, bias=True) (positional_encoding): PositionalEncoding( (dropout): Dropout(p=0, inplace=False) ) (encoder): Encoder( (encoder_layers): ModuleList( (0): Encoderlayer( (attention): AttentionLayer( (inner_attention): FullAttention( (dropout): Dropout(p=0.1, inplace=False) ) (query_projection): Linear(in_features=64, out_features=64, bias=True) (key_projection): Linear(in_features=64, out_features=64, bias=True) (value_projection): Linear(in_features=64, out_features=64, bias=True) (out_projection): Linear(in_features=64, out_features=64, bias=True) ) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (linear): Linear(in_features=64, out_features=64, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (linear_layers): ModuleList( (0): Linear(in_features=64, out_features=64, bias=True) ) ) (decoder): Decoder( (decoder_layers): ModuleList( (0): Decoderlayer( (self_attention): AttentionLayer( (inner_attention): FullAttention( (dropout): Dropout(p=0.1, inplace=False) ) (query_projection): Linear(in_features=64, out_features=64, bias=True) (key_projection): Linear(in_features=64, out_features=64, bias=True) (value_projection): Linear(in_features=64, out_features=64, bias=True) (out_projection): Linear(in_features=64, out_features=64, bias=True) ) (cross_attention): AttentionLayer( (inner_attention): FullAttention( (dropout): Dropout(p=0.1, inplace=False) ) (query_projection): Linear(in_features=64, out_features=64, bias=True) (key_projection): Linear(in_features=64, out_features=64, bias=True) (value_projection): Linear(in_features=64, out_features=64, bias=True) (out_projection): Linear(in_features=64, out_features=64, bias=True) ) (norm1): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm2): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (norm3): LayerNorm((64,), eps=1e-05, elementwise_affine=True) (linear1): Linear(in_features=64, out_features=256, bias=True) (linear2): Linear(in_features=256, out_features=64, bias=True) (dropout): Dropout(p=0.1, inplace=False) ) ) (norm): LayerNorm((64,), eps=1e-05, elementwise_affine=True) ) (fc_in): Linear(in_features=64, out_features=64, bias=True) (relu): ReLU() (dropout): Dropout(p=0.1, inplace=False) (fc_out): Linear(in_features=64, out_features=1, bias=True) )