Closed patdflynn closed 3 years ago
For the masking in the attention, the masking occurs here: https://github.com/sahajgarg/image_transformer/blob/d33b8d007299b434c62e068e1dad35b8a2688212/image_transformer.py#L303 This generates an upper triangular mask on the logits of the attention, preventing any information from future pixels from reaching the current pixel. The training code can evaluate the conditional probability of each pixel given all the previous pixels simultaneously, so long as this masking does occur.
Thank you for the quick response!
Hi there, I essentially have the same problem as this issue. Would you mind clarifying how masking is implemented?
I want to modify masking so that I may get inference to work in a different order.