Closed Xinpeng-Wang closed 2 years ago
Hi @Xinpeng-Wang,
thanks for your interest in our work! The xi
in
https://github.com/ml-jku/hopfield-layers/blob/f56f929c95b77a070ae675ea4f56b6d54d36e730/hflayers/functional.py#L419
is the p
in equations (3), or (442). The xi
, as described in our paper, is ultimately computed as
https://github.com/ml-jku/hopfield-layers/blob/f56f929c95b77a070ae675ea4f56b6d54d36e730/hflayers/functional.py#L439
and termed attn_output
to be in line with the official PyTorch repository v1.6.0 (see Disclaimer for more details).
yes, but in the case of multiple update, as described in the paper, a threshold is applied on the xi_new and xi_old. And this threshold is also applied on the xi in the code. https://github.com/ml-jku/hopfield-layers/blob/f56f929c95b77a070ae675ea4f56b6d54d36e730/hflayers/functional.py#L429
Yes, strictly speaking, the threshold in the implementation is directly applied on the basis of p
. But if p
does not change between multiple updates, xi
does not change either. The naming in the implementation is a little bit misleading, as p
is used as a proxy for xi
in this case.
A small note on multiple updates in general: one update step is already enough, as stated in Theorem 4 of the paper.
Hi, nice work! As I read the code, I found that the xi in the code is actually the softmax output of the key-query association matrix. https://github.com/ml-jku/hopfield-layers/blob/f56f929c95b77a070ae675ea4f56b6d54d36e730/hflayers/functional.py#L419
But in the paper, it says it is product of the softmax and stored patterns.
Can you explain that?