orion-orion / FedDCSR

🔬 [SDM'24] This is the source code and baselines of our paper FedDCSR: Federated Cross-domain Sequential Recommendation via Disentangled Representation Learning.
Apache License 2.0
25 stars 1 forks source link

In-place operation in SelfAttention causing backward pass error #3

Closed languangduan closed 3 months ago

languangduan commented 3 months ago

Hello,

I encountered an issue while using the SelfAttention module in your code. The problem was related to an in-place operation that was causing errors during the backward pass. Specifically, the error occurred in the NativeLayerNormBackward0 operation.

Problem

In the forward method of the SelfAttention class, in the models/vsgan/module.py, there are two in-place multiplication operations:

seqs *= ~timeline_mask.unsqueeze(-1)

This in-place operation was breaking the computational graph and causing issues during backpropagation.

Error Message

The error manifested as:

Error detected in NativeLayerNormBackward0. Traceback of forward call that caused the error:
  ...
  File "...\models\vgsan\modules.py", line 97, in forward
    Q = self.attention_layernorms[i](seqs)
  ...

Solution

I resolved the issue by changing the in-place operation to a regular operation:

seqs = seqs * (~timeline_mask.unsqueeze(-1))

After making this change, the error disappeared and the model trained successfully.

Suggestion

It might be beneficial to update the code in the repository to use this non-in-place operation. This would prevent other users from encountering the same issue and ensure smoother backward passes during training.

Thank you for your great work on this project, and I hope this information is helpful!

orion-orion commented 3 months ago

Thank you for your suggestion. To be honest, I noticed this problem a long time ago, which is why I wrote in README that the Pytorch version needs to be <=1.7.1, otherwise this error will be reported. The reason why I did not change it to non-in-place operation is that when I found this problem, it was close to the DDL of the paper submission, and after the change, the experimental results will be different from the experimental results written in the paper (I don’t know why...)

languangduan commented 3 months ago

I understand. I sometimes encounter similar issues too. Thank you for your great work, regardless.

orion-orion commented 3 months ago

(#^.^#)