Open leekum2018 opened 2 years ago
Hi, I am not one of the original authors, but I had a similar problem when I first read the paper, so I reached out to the original author a few months ago with the same question.
The top p tokens are actually the most certain tokens, which just have argmax computed and are fixed. The other tokens have logits computed again. In their words, "We could think of this procedure as using more compute for uncertain tokens". I agree the method there is a bit confusing to read.
I should say, I don't think my argmax unrolled implementation is correct. I started working on it until they said "In practice, this may not save that much computation as subtensor operation is not efficient" which somewhat killed motivation to finish that part (and so, I should probably add a warning or remove it)
Let me know if you have any more questions. Thanks!
Thank you for your answer! And do you plan to provide the checkpoints you have trained?
I've only trained text8 so far which can be easily done even on a small GPU. Currently working on EN-DE translation in the mt
branch. When that is done I'll release checkpoints.
I wonder if Github exist any Non-autoregressive model checkpoints available? Since my idea need to be evaluated by using a trained NAG LM for English, but I have no sufficient hardware resource to train one. Thx! I want to take some Text In-painting experiments like in SUNDAE, so SUNDAE is the most ideal model for me. But it is a pity that SUNDAE does not open-source.
First, thanks for bringing up the work STEP-UNROLLED DENOISING AUTOENCODERS FOR TEXT GENERATION, a very nice work. I have a question regarding to the Argmax-unrolled decoding. As said in Section 3.1 Decoding part, "the method finds top-ρ share of the tokens that are sorted by the log-probability in descending order (i.e. uncertain tokens) from λ_t−1". I wonder why the top share of tokens are viewed as the uncertain tokens and then replaced by performing unrolling. In my thought, the higher log-probability a token has, the higher confidence of it being the true token. Looking forward to your reply! Thanks!