Why does ww_t have nb_reads components?

tristandeleu / ntm-one-shot

One-shot Learning with Memory-Augmented Neural Networks

MIT License

421 stars 94 forks source link

Why does ww_t have nb_reads components? #4

Closed markpwoodward closed 7 years ago

markpwoodward commented 7 years ago

Thank you @tristandeleu for this library, it has helped me better understand the paper. I am implementing a tensorflow version of the ntm-lrua model, and I have a question about your implementation.

Why do W_add, b_add, a_t, sigma_t, ww_t all have nb_reads elements? The paper seems to have only one "write head" as I gathered from the text and Figure 7.

The paper does explicitly talk about wlu_tm1 containing nb_reads 1's, which would mean we write the single a_t identically to nb_reads locations. That doesn't seem to make sense.

Any thoughts would be greatly appreciated. Thank you

tristandeleu commented 7 years ago

Hey Mark, I'm glad the code helped you! Indeed I may have taken some liberties from the paper:

Figure 7 indeed hints at having only one single write head. If I remember correctly, I followed Equation 7 which suggests that every read head wr_tm1 has a corresponding ww_t. Similarly I considered that wlu_tm1 was not a single vector with nb_heads 1's but nb_heads one hot vectors, to ensure that ww_t remains a proper distribution (sums to 1).
This also explains why sigma_t has nb_reads elements as well ; one for each write head.
The paper doesn't explicitly separate k_t and a_t. I did that to match the NTM paper, which separates the key to query the memory from what is added to the memory. Again these have nb_reads elements as well for each read/write head.

Hope it makes more sense!

markpwoodward commented 7 years ago

Hi Tristan, Thanks again. That all makes sense. Your choices seem like the best fit to the paper to me.

snitchjinx commented 7 years ago

Hi @tristandeleu. I'm trying to understand the work of one shot learning by Google and found your project to be a nice learning material. But I'm a little confused by the usage of omniglot.py and test_model.py. It seems the controller of MANN has to be firstly trained in the hard way. Does the training in omniglot.py means this pre-stage training? I don't understand how test_batch_size() and test_shape() are related to the description in the paper. Would you please give some instructions on that or add more comments in the code? Thanks in advance!

DavidZhang88 commented 7 years ago

Hi @markpwoodward,can you execute this program? i met two errors in my issue,could you offer me some help? thank you so much.😭