I was just going through the code in architectures.py and the paper side-by-side. I can't seem to find the query*key operation in the code. As I understand it, this should happen in AttentionNetwork. From what I see, this is "attention SNN" from Figure 2 in the paper, followed by a linear layer that computes attention weights straight from the keys?
Please let me know if I misunderstood something here, from the paper I assumed that a query*key operation must be performed there.
Hi,
I was just going through the code in
architectures.py
and the paper side-by-side. I can't seem to find the query*key operation in the code. As I understand it, this should happen inAttentionNetwork
. From what I see, this is "attention SNN" from Figure 2 in the paper, followed by a linear layer that computes attention weights straight from the keys?Please let me know if I misunderstood something here, from the paper I assumed that a
query*key
operation must be performed there.