teelinsan / parallel-decoding

Repository of the paper "Accelerating Transformer Inference for Translation via Parallel Decoding"
https://gladia.di.uniroma1.it/publication/ipi/
Apache License 2.0
108 stars 8 forks source link

Question about the paper #1

Closed MeWannaSleep closed 11 months ago

MeWannaSleep commented 1 year ago

image In algorithm 1,you state that image I mean image is supposed to be exactly 1,Am I missing or mistaking someting here?

teelinsan commented 1 year ago

Hi,

The notation in line 5 is a shorthand for "we are sampling all the tokens in parallel". Technically it is not a single probability distribution but for each i_th token, you have a p_ith distribution over the vocabulary conditioned on all the past tokens. Also the conditioning y_1:m is shifted right. As explained at the end of Appendix A, it is pseudocode and some details are omitted.

If you look at the implementation, basically it consists of sampling from the model https://github.com/teelinsan/parallel-decoding/blob/19769bf5aa2a41f02d6b0344aa1ee88e3d59cfa4/src/ipi/decoders/gs_jacobi.py#L68-L75 and then doing argmax with dimension -1 https://github.com/teelinsan/parallel-decoding/blob/19769bf5aa2a41f02d6b0344aa1ee88e3d59cfa4/src/ipi/decoders/gs_jacobi.py#L82

Hope this help.

Andrea