mlpen / Nystromformer

Apache License 2.0
356 stars 41 forks source link

score of softmax on Text4k; linformer-256 & nystrom-64 doesn't work #15

Open ZiweiHe opened 2 years ago

ZiweiHe commented 2 years ago

Hi,

Thanks for the excellent work!

I found some issues in my humble trials (I didn't change anything in the code):

  1. using softmax attention on Text4k I got ~63.7 acc instead of 65.02 you posted in your paper.
  2. again I tried linear attention Text4k I got ~64 acc, it's even higher than vanilla transformer, I wonder did you get the same result from your side?
  3. the attention types linformer-256 and nystrom-64 doesn't work, the errors are either dimensions mismatching or config key error. It seems like not all the attention types can successfully run when you release the code. Btw I didn't try out all the choices.

Thank you for your time, I look forward to your reply~

Ziwei

mlpen commented 2 years ago

Are you using code from LRA? This config file is an example. To run LRA on other attentions, you can modify the "attn_type" (see possible attention methods in code) and add the specified attention related setting.