Closed terencenwz closed 3 years ago
Hi @terencenwz!
I see that you have opened up an issue here regarding a similar problem: https://github.com/lucidrains/linear-attention-transformer/issues/6. Therefore, this might be related to the way you are running your tests, I am not sure? It should not go down to 0 right away...
But on another note, there are some caveats that you should know when using the linformer for causal lm. Check out #15 and #16 for some more information about this.
Hi, thanks for the reply. I believe the linear-attention-transformer is a slightly different problem as the loss goes to infinity instead of 0. I have ran quite a number of different variants of transformer models including the original model and got comparable loss.
I think the problem might be explained by https://github.com/tatp22/linformer-pytorch/issues/16#issuecomment-733525011 where there is some leakage of the future information. The loss did not go down to 0 right away, it took slight more than 1epoch (around 30k update steps)
Hi, I used the LinformerLM class with casual=True to do some language modelling. However, there seems to be some leakage as the loss goes to 0 after 1 epoch. Or am I using it wrongly? Thank you.
These are my settings