xi in the code is different from the paper?

ml-jku / hopfield-layers

Hopfield Networks is All You Need

https://ml-jku.github.io/hopfield-layers/

Other

1.69k stars 189 forks source link

xi in the code is different from the paper? #22

Closed Xinpeng-Wang closed 2 years ago

Xinpeng-Wang commented 2 years ago

Hi, nice work! As I read the code, I found that the xi in the code is actually the softmax output of the key-query association matrix. https://github.com/ml-jku/hopfield-layers/blob/f56f929c95b77a070ae675ea4f56b6d54d36e730/hflayers/functional.py#L419

But in the paper, it says it is product of the softmax and stored patterns.

Can you explain that?

bschaefl commented 2 years ago

Hi @Xinpeng-Wang,

thanks for your interest in our work! The xi in https://github.com/ml-jku/hopfield-layers/blob/f56f929c95b77a070ae675ea4f56b6d54d36e730/hflayers/functional.py#L419 is the p in equations (3), or (442). The xi, as described in our paper, is ultimately computed as https://github.com/ml-jku/hopfield-layers/blob/f56f929c95b77a070ae675ea4f56b6d54d36e730/hflayers/functional.py#L439 and termed attn_output to be in line with the official PyTorch repository v1.6.0 (see Disclaimer for more details).

Xinpeng-Wang commented 2 years ago

yes, but in the case of multiple update, as described in the paper, a threshold is applied on the xi_new and xi_old. And this threshold is also applied on the xi in the code. https://github.com/ml-jku/hopfield-layers/blob/f56f929c95b77a070ae675ea4f56b6d54d36e730/hflayers/functional.py#L429

bschaefl commented 2 years ago

Yes, strictly speaking, the threshold in the implementation is directly applied on the basis of p. But if p does not change between multiple updates, xi does not change either. The naming in the implementation is a little bit misleading, as p is used as a proxy for xi in this case.

A small note on multiple updates in general: one update step is already enough, as stated in Theorem 4 of the paper.