About the masked weight

sahajgarg / image_transformer

Pytorch implementation of the image transformer for unconditional image generation

116 stars 32 forks source link

About the masked weight #1

Closed ggluo closed 4 years ago

ggluo commented 4 years ago

Hi, Sahaj. Maybe a dumb question. Can I know how the masked attention weigtht is implemented in your script?

patdflynn commented 3 years ago

I have the same question, I'm unclear where masking is implemented.

sahajgarg commented 3 years ago

For the masking in the attention, the masking occurs here: https://github.com/sahajgarg/image_transformer/blob/d33b8d007299b434c62e068e1dad35b8a2688212/image_transformer.py#L303 This generates an upper triangular mask on the logits of the attention, preventing any information from future pixels from reaching the current pixel. The training code can evaluate the conditional probability of each pixel given all the previous pixels simultaneously, so long as this masking does occur.