lucidrains / perceiver-pytorch

Implementation of Perceiver, General Perception with Iterative Attention, in Pytorch
MIT License
1.1k stars 134 forks source link

PerceiverAR? #60

Closed siddk closed 2 years ago

siddk commented 2 years ago

Hey @lucidrains - love this repo, and still trying to wrap my head around the various difference between Perceiver architectures; how hard would it be to extend PerceiverIO to PerceiverAR; what fundamentally needs to change?

lucidrains commented 2 years ago

@siddk oh hey Sid, good to hear from you. a number of things actually - i think the consensus at Eleuther was that perceiver AR is not very promising. i don't have any plans on building it, if that is what you mean

siddk commented 2 years ago

Hey Phil - no sorry, wasn't intending for you to build it! Just wanted to get your thoughts on usability of your current code for the AR model (if it makes sense).

If it isn't too much trouble, could you tag me/point me to the discussion on the viability of the model?

lucidrains commented 2 years ago

@siddk yea, i think this repository is a good starting point - however, be aware that in the paper, they had to do some special masking regularization in the cross attention blocks (on the attn logits with -inf rather than post-softmax dropout) to get it to work well

the discussion happened in Eleuther's #research channel. one of the critics summarize it best as "this is just encoder / decoder architecture with encoder of 0 layers"

siddk commented 2 years ago

Thanks so much! Found the discussion... makes a good point, maybe I'm overthinking things 😅. Thanks so much for the kind response @lucidrains!

lucidrains commented 2 years ago

@siddk sounds good, good luck with your studies / research :)