attention_adapter.params.grad为None

lancopku / label-words-are-anchors

Repository for Label Words are Anchors: An Information Flow Perspective for Understanding In-Context Learning

MIT License

151 stars 13 forks source link

attention_adapter.params.grad为None #28

Open CoverZhao opened 2 months ago

CoverZhao commented 2 months ago

作者你好！我在运行源代码attention_attr.py时报错： File "/aiarena/gpfs/label-words-are-anchors/attention_attr.py", line 144, in saliency = attentionermanger.grad(use_abs=True)[i] File "/aiarena/gpfs/label-words-are-anchors/icl/analysis/attentioner_for_attribution.py", line 104, in grad grads.append(self.grad_process(attention_adapter.params.grad,*args,**kwargs)) AttributeError: 'NoneType' object has no attribute 'grad' 请问这是什么原因造成的？

leanwang326 commented 2 months ago

代码在单 gpu，默认setting下应该不会有问题，有可能是因为使用了pipeling parallelism

CoverZhao commented 2 months ago

代码在单 gpu，默认setting下应该不会有问题，有可能是因为使用了pipeling parallelism

我是使用了单gpu。最开始直接运行时，程序会报错： Traceback (most recent call last): File "/aiarena/gpfs/label-words-are-anchors/attention_attr.py", line 143, in loss.backward() File "/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward torch.autograd.backward( File "/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

然后我发现用 loss = F.cross_entropy(output['logits'], label)计算出的loss的require_gard是false 于是我加了一行 loss.requires_grad=True,再次运行时就有了上文说的attention_adapter.params.grad为None 的报错

leanwang326 commented 2 months ago

也许是哪里设置了torch.no_grad/inference_mode，或者是你再在params 和attention相乘的时候设一下params.requires_grad=True，或者是如果你用了flash_attention的话可能会有问题（现在的代码没支持这个，我看最新的flash attention似乎支持了乘mask以及backward，如果你需要的话可以自己适配一下

CoverZhao commented 2 months ago

也许是哪里设置了torch.no_grad/inference_mode，或者是你再在params 和attention相乘的时候设一下params.requires_grad=True，或者是如果你用了flash_attention的话可能会有问题（现在的代码没支持这个，我看最新的flash attention似乎支持了乘mask以及backward，如果你需要的话可以自己适配一下

是flash attention的问题，我重新建了一个环境就解决了，感谢！

lilhongxy commented 1 month ago

请问可以说一下新环境的配置吗，现在也遇到了类似的问题

CoverZhao commented 1 month ago

请问可以说一下新环境的配置吗，现在也遇到了类似的问题

我是按照requirements.txt里面配置，稍微改了一下： datasets ipython==8.11.0 matplotlib==3.7.1 numpy seaborn==0.12.2 tqdm==4.65.0 transformers==4.37.0 然后torch直接去官网上装的