Add dropout for weight update operations. At each time step, the Head randomly skips either the Content addressing update or the Convolutional shift update of the read and write weights. This can speed-up training as the model will tend to explore more combinations.
TO-DO
[ ] Make the operation dropout sequence specific and not time specific
Add dropout for weight update operations. At each time step, the
Head
randomly skips either the Content addressing update or the Convolutional shift update of the read and write weights. This can speed-up training as the model will tend to explore more combinations.TO-DO