locuslab / wanda

A simple and effective LLM pruning approach.
https://arxiv.org/abs/2306.11695
MIT License
614 stars 78 forks source link

Error when trying to run Llama2-7b: attention_mask and position_ids are None #60

Open Philippe-Guyard opened 2 months ago

Philippe-Guyard commented 2 months ago

From what I see in the Llama2 code on hugging face, the attention_mask and position_ids variables are never set by the model. This results in cache['attention_mask'] and cache['position_ids'] being None and the script failing on lib/prune.py line 144:

    if f"model.layers.{i}" in model.hf_device_map:   ## handle the case for llama-30B and llama-65B, when the device map has multiple GPUs;
            dev = model.hf_device_map[f"model.layers.{i}"]
            inps, outs, attention_mask, position_ids = inps.to(dev), outs.to(dev), attention_mask.to(dev), position_ids.to(dev)

Please note that I do not have access to GPUs with more than 40GB VRAM, and the 7B model does not fit in 40GB for me, so I have to use a device map for the 7B model, which leads to the following error.

Logan-007L commented 2 weeks ago

hello, Have you solved it?