questions about the sampling strategy for baseline model

in your paper, you have said: 【Also, to compare the diversity introduced by the stochasticity in the proposed latent variable versus the softmax of RNN at each decoding step, we generate N responses from the baseline by sampling from the softmax. For CVAE/kgCVAE, we sample N times from the latent z and only use greedy decoders so that the randomness comes entirely from the latent variable z.】

the tradictional beam search with size B have two step, first for each beam, generate top-B words from the vocab-softmax, then generate top-B beams from the B*B candidate sequences using the average probability. Is the sampling in the above figure means two multinomial step for the inner vocabulary softmax and the outer average probability? And is the inner sampling with replacement or without replacement? Is the outer sampling with replacement or without replacement?

snakeztc / NeuralDialog-CVAE

questions about the sampling strategy for baseline model #9