pruning_bert_glue example pruning error

sukritij29 commented 11 months ago

Describe the issue:

Environment:

NNI version: 3.0
Training service (local|remote|pai|aml|etc): ubuntu deep learning ami
Client OS: ubuntu
Server OS (for remote mode only):
Python version: 3.10.9
PyTorch/TensorFlow version: Pytorch 2.1.0+cu121
Is conda/virtualenv/venv used?: no
Is running in Docker?: no

Configuration:

Experiment config (remember to remove secrets!):
Search space:

Log message:

nnimanager.log:
dispatcher.log:
nnictl stdout and stderr:

How to reproduce it?: just run the pruning_bert_glue tutorial

from nni.compression.pruning import MovementPruner from nni.compression.speedup import ModelSpeedup from nni.compression.utils.external.external_replacer import TransformersAttentionReplacer

def pruning_attn(): Path('./output/bert_finetuned/').mkdir(parents=True, exist_ok=True) model = build_finetuning_model(task_name, f'./output/bert_finetuned/{task_name}.bin') trainer = prepare_traced_trainer(model, task_name) evaluator = TransformersEvaluator(trainer)

config_list = [{
    'op_types': ['Linear'],
    'op_names_re': ['bert\.encoder\.layer\.[0-9]*\.attention\.*'],
    'sparse_threshold': 0.1,
    'granularity': [64, 64]
}]

pruner = MovementPruner(model, config_list, evaluator, warmup_step=9000, cooldown_begin_step=36000, regular_scale=10)
pruner.compress(None, 4)
pruner.unwrap_model()

masks = pruner.get_masks()
Path('./output/pruning/').mkdir(parents=True, exist_ok=True)
torch.save(masks, './output/pruning/attn_masks.pth')
torch.save(model, './output/pruning/attn_masked_model.pth')

if not skip_exec: pruning_attn()


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[/tmp/ipykernel_56471/3847997197.py:3](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2243617073746f6e65227d.vscode-resource.vscode-cdn.net/tmp/ipykernel_56471/3847997197.py:3): FutureWarning: load_metric is deprecated and will be removed in the next major version of datasets. Use 'evaluate.load' instead, from the new library 🤗 Evaluate: https://huggingface.co/docs/evaluate
  metric = load_metric('glue', task_name)
[2023-11-06 02:54:46] WARNING: trainer.optimzer is not wrapped by nni.trace, or trainer.optimzer is None, will using huggingface default optimizer.
[2023-11-06 02:54:46] WARNING: trainer.lr_scheduler is not wrapped by nni.trace, or trainer.lr_scheduler is None, will using huggingface default lr_scheduler.
[2023-11-06 02:54:46] WARNING: Using epochs number as training duration, please make sure the total training steps larger than `cooldown_begin_step`.
You are adding a <class 'nni.compression.utils.evaluator.PatchCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
PrinterCallback
PatchCallback
[/home/ubuntu/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py:557](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2243617073746f6e65227d.vscode-resource.vscode-cdn.net/home/ubuntu/.local/lib/python3.8/site-packages/torch/utils/data/dataloader.py:557): UserWarning: This DataLoader will create 12 worker processes in total. Our suggested max number of worker in current system is 4, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
  warnings.warn(_create_warning_msg(
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[/home/ubuntu/NAS_exps/new_pruning_bert_glue.ipynb](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2243617073746f6e65227d.vscode-resource.vscode-cdn.net/home/ubuntu/NAS_exps/new_pruning_bert_glue.ipynb) Cell 23 line 3
     26     torch.save(model, '[./output/pruning/attn_masked_model.pth](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2243617073746f6e65227d.vscode-resource.vscode-cdn.net/home/ubuntu/NAS_exps/output/pruning/attn_masked_model.pth)')
     29 if not skip_exec:
---> 30     pruning_attn()

[/home/ubuntu/NAS_exps/new_pruning_bert_glue.ipynb](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2243617073746f6e65227d.vscode-resource.vscode-cdn.net/home/ubuntu/NAS_exps/new_pruning_bert_glue.ipynb) Cell 23 line 2
     12 config_list = [{
     13     'op_types': ['Linear'],
     14     'op_names_re': ['bert\.encoder\.layer\.[0-9]*\.attention\.*'],
     15     'sparse_threshold': 0.1,
     16     'granularity': [64, 64]
     17 }]
     19 pruner = MovementPruner(model, config_list, evaluator, warmup_step=9000, cooldown_begin_step=36000, regular_scale=10)
---> 20 pruner.compress(None, 4)
     21 pruner.unwrap_model()
     23 masks = pruner.get_masks()

File [~/.local/lib/python3.8/site-packages/nni/compression/pruning/movement_pruner.py:228](https://vscode-remote+ssh-002dremote-002b7b22686f73744e616d65223a2243617073746f6e65227d.vscode-resource.vscode-cdn.net/home/ubuntu/NAS_exps/~/.local/lib/python3.8/site-packages/nni/compression/pruning/movement_pruner.py:228), in MovementPruner.compress(self, max_steps, max_epochs)
    225     warn_msg = \
    226         f'Using epochs number as training duration, please make sure the total training steps larger than `cooldown_begin_step`.'
    227     _logger.warning(warn_msg)
--> 228 return super().compress(max_steps, max_epochs)
...
-> 1080     assert optimizer is not None
   1081     old_step = optimizer.step
   1083     def patched_step(_, *args, **kwargs):

AssertionError: 
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

zephyr06 commented 10 months ago

I meet the same issue there; did you find a way to solve it?

BILL-CS commented 7 months ago

I meet the same issue... Did anyone else solve?

Brundaraj commented 2 months ago

I met the same issue , have you solve it ? also I'm getting this error in the same pruner step : TypeError: TransformersEvaluator._init_optimizer_helpers..patched_get_optimizer_cls_and_kwargs() takes 1 positional argument but 2 were given

microsoft / nni

pruning_bert_glue example pruning error #5706