Closed liujiaheng closed 3 years ago
In other words, the process is shown in the above image. But I have not find the MaskedTransformerBlock in your code. Could you give me some advice?
Thanks.
@liujiaheng Hi Jiaheng! I don't offer the code for the autoregressive scheme with all the clever masking. Most of my use cases for axial attention have been for enabling attention of large images without incurring the quadratic cost
Thanks for your kind reply. I will try my best to reimplement the autoregressive scheme.
Hi, this is a nice paper. How can I use your shared code to reimplement the image modeling task on ImageNet 32x32?
Thanks. Looking forward to your reply.