Loss for minimize and maximize phase

spencerbraun / anomaly_transformer_pytorch

PyTorch implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

MIT License

222 stars 53 forks source link

Loss for minimize and maximize phase #4

Open raphaelreinauer opened 2 years ago

raphaelreinauer commented 2 years ago

There is an issue with the loss: https://github.com/spencerbraun/anomaly_transformer_pytorch/blob/6d15200911260eee910a3664d70f07886c47708b/train.py#L53 the reconstruction loss cancels, and we only end up with -2*discrepancy loss. Additionally, the stop-gradient as described in the paper is not implemented.

sappersapper commented 2 years ago

Is it this according to the paper?

min_loss.backward(retain_graph=True)
max_loss.backward()

spencerbraun commented 2 years ago

Hi @raphaelreinauer, thanks for the suggestion, the training script is certainly incomplete. Your comment makes sense, and I agree my loss scheme is incorrect.

With regards to the stop gradient, I believe this is handled within the loss functions as I detach the S tensors and P tensors in min_loss and max_loss respectively. Please let me know if this is what you had in mind. https://github.com/spencerbraun/anomaly_transformer_pytorch/blob/6d15200911260eee910a3664d70f07886c47708b/model.py#L128-L138

spencerbraun commented 2 years ago

@sappersapper Thank you for the suggestion. I believe that will work, though might need to think about whether the memory overhead can be avoided.

syoungbaak commented 2 years ago

When I tried to employ the below as

min_loss.backward(retain_graph=True) max_loss.backward()

it worked well.

However, at the second input, I got the following error.

one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [8, 1]], which is output 0 of AsStridedBackward0, is at version 2; expected version 1 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

I think it is mainly due to self.P_list and self.S_list at AnomalyAttention were declared at the definition.

Should they declared at the forward ?

Naji-Najari commented 2 years ago

@syoungbaak I am facing the same problem (one of the variables needed for gradient computation has been modified by an inplace operation). Did you solve this problem ?

syoungbaak commented 2 years ago

It will be solved if self.P_list and self.S_list at AnomalyAttention are defined at the forward part not the initialization part.

@syoungbaak I am facing the same problem (one of the variables needed for gradient computation has been modified by an inplace operation). Did you solve this problem ?

Naji-Najari commented 2 years ago

@syoungbaak Ok, thank you !