I found some issues in my humble trials (I didn't change anything in the code):
using softmax attention on Text4k I got ~63.7 acc instead of 65.02 you posted in your paper.
again I tried linear attention Text4k I got ~64 acc, it's even higher than vanilla transformer, I wonder did you get the same result from your side?
the attention types linformer-256 and nystrom-64 doesn't work, the errors are either dimensions mismatching or config key error. It seems like not all the attention types can successfully run when you release the code. Btw I didn't try out all the choices.
Thank you for your time, I look forward to your reply~
Are you using code from LRA? This config file is an example. To run LRA on other attentions, you can modify the "attn_type" (see possible attention methods in code) and add the specified attention related setting.
Hi,
Thanks for the excellent work!
I found some issues in my humble trials (I didn't change anything in the code):
Thank you for your time, I look forward to your reply~
Ziwei