Open kast424 opened 6 months ago
It seems you are using the mistral model, for a model with different definition file (https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/models/mistral/modeling_mistral.py), the codebase needs to be adapted.
I could see magnitude pruning work on the same mistral model.
It seems you are using the mistral model, for a model with different definition file (https://github.com/huggingface/transformers/blob/v4.40.0/src/transformers/models/mistral/modeling_mistral.py), the codebase needs to be adapted.
Same. So where should we change to accommodate models with different structures?
Hi @Shinning-Zhou and @kast424,
Did you figure out how to work around this issue? I am facing similar error when working with llama-2-7b-chat
(#67)
Hi @Shinning-Zhou and @kast424,
Did you figure out how to work around this issue? I am facing similar error when working with
llama-2-7b-chat
(#67)
I removed the attention_mask variable because llama doesn't have it
Thanks, that resolved the issue :)
Hi @Shinning-Zhou, @kast424
I am trying to run this repo on Mistral but am facing a different error, is there a fix for this yet i.e can we prune using SparseGPT and Wanda on Mistral!
Here's the issue in more detail - #68
I am trying to prune with python main.py \ --model mistralai/Mistral-7B-Instruct-v0.2 \ --prune_method wanda \ --sparsity_ratio 0.5 \ --sparsity_type unstructured \ --save out/mistral_7b/unstructured/wanda/ and the output is as below. torch 2.3.0 transformers 4.41.0.dev0 accelerate 0.31.0.dev0
of gpus: 2
loading llm model mistralai/Mistral-7B-Instruct-v0.2 ^MLoading checkpoint shards: 0%| | 0/3 [00:00<?, ?it/s]^MLoading checkpoint shards: 33%|███▎ | 1/3 [00:30<01:01, 30.79s/it]^MLoading checkpoint shards: 67%|██████▋ | 2/3 [00:46<00:21,$ use device cuda:0 pruning starts loading calibdation data dataset loading complete Traceback (most recent call last): File "/mnt/parscratch/users/acq22stk/teamproject/wanda/main.py", line 110, in
main()
File "/mnt/parscratch/users/acq22stk/teamproject/wanda/main.py", line 69, in main
prune_wanda(args, model, tokenizer, device, prune_n=prune_n, prune_m=prune_m)
File "/mnt/parscratch/users/acq22stk/teamproject/wanda/lib/prune.py", line 144, in prune_wanda
inps, outs, attention_mask, position_ids = inps.to(dev), outs.to(dev), attention_mask.to(dev), position_ids.to(dev)
AttributeError: 'NoneType' object has no attribute 'to'