Question about sampling in stochastic attention

jazzsaxmafia commented 9 years ago

Hello, while analyzing the source code, I found the process of getting alpha_sample by stochastic hard attention quiet not clear, mainly because of the variable 'h_sampling_mask'

The sampling part of the code is (in capgen.py, line 409), alpha_sample = h_sampling_mask * trng.multinomial(pvals=alpha,dtype=theano.config.floatX)\

(1.-h_sampling_mask) * alpha

When h_sampling_mask is 1, alpha_sample would be the sampling result of the multinomial distribution. When h_sampling_mask is 0, however, alpha_sample would be simply alpha.

I though, according to the paper, alpha_sample should be simply alpha_sample = trng.multinomial(pvals=alpha,dtype=theano.config.floatX) which is equivalent to setting h_sampling_mask 1.

Why is "h_sampling_mask" needed?

kelvinxu commented 9 years ago

We only sample half the time. In other cases, we simple use the expected value as explained in the bottom half of page 5 of the arxiv paper.

jazzsaxmafia commented 9 years ago

Ah so sampling for half of the time, and use deterministic attention for rest of the time. Now it is clear. Thank you.

kelvinxu commented 9 years ago

No problem.

kelvinxu / arctic-captions

Question about sampling in stochastic attention #9