Error when trying to run Llama2-7b: attention_mask and position_ids are None

From what I see in the Llama2 code on hugging face, the attention_mask and position_ids variables are never set by the model. This results in cache['attention_mask'] and cache['position_ids'] being None and the script failing on lib/prune.py line 144:

    if f"model.layers.{i}" in model.hf_device_map:   ## handle the case for llama-30B and llama-65B, when the device map has multiple GPUs;
            dev = model.hf_device_map[f"model.layers.{i}"]
            inps, outs, attention_mask, position_ids = inps.to(dev), outs.to(dev), attention_mask.to(dev), position_ids.to(dev)

Please note that I do not have access to GPUs with more than 40GB VRAM, and the 7B model does not fit in 40GB for me, so I have to use a device map for the 7B model, which leads to the following error.

locuslab / wanda

Error when trying to run Llama2-7b: attention_mask and position_ids are None #60