The results of sink/transformer/windowed under outputs_*/ folders are all the same

ZiweiHe commented 9 months ago

Hi,

The results you gave here are problematic. Under the same directory, the results of 3 different attention types are identical.

tomaarsen commented 9 months ago

Hello!

They indeed start equivalent for the first 1024 tokens, but they differ after that. This is because the windowed and attention_sinks approaches in the benchmarks use a window size of 1024, before which the three approaches are identical. See for example token 4000 for evidence that the results are not identical:

Also, these figures in the README are direct plots of these .csv files: As you can see, they're not identical.

I hope that clears it up!

Edit: If there is indeed a model for which they are exactly identical, then please let me know and I'll resolve it! I may have made a mistake at some point.

Tom Aarsen

ZiweiHe commented 9 months ago

Oh my mistake, thank you for your reply. Plese feel free to delete this issue!

tomaarsen commented 9 months ago

No worries!

tomaarsen / attention_sinks

The results of sink/transformer/windowed under outputs_*/ folders are all the same #18