lucidrains / axial-attention

Implementation of Axial attention - attending to multi-dimensional data efficiently
MIT License
346 stars 30 forks source link

Reimplementation of image modeling results in AXIAL ATTENTION IN MULTIDIMENSIONAL TRANSFORMERS. #3

Closed liujiaheng closed 3 years ago

liujiaheng commented 3 years ago

Hi, this is a nice paper. How can I use your shared code to reimplement the image modeling task on ImageNet 32x32?

Thanks. Looking forward to your reply.

liujiaheng commented 3 years ago

image In other words, the process is shown in the above image. But I have not find the MaskedTransformerBlock in your code. Could you give me some advice?

Thanks.

lucidrains commented 3 years ago

@liujiaheng Hi Jiaheng! I don't offer the code for the autoregressive scheme with all the clever masking. Most of my use cases for axial attention have been for enabling attention of large images without incurring the quadratic cost

liujiaheng commented 3 years ago

Thanks for your kind reply. I will try my best to reimplement the autoregressive scheme.