Closed siddk closed 2 years ago
@siddk oh hey Sid, good to hear from you. a number of things actually - i think the consensus at Eleuther was that perceiver AR is not very promising. i don't have any plans on building it, if that is what you mean
Hey Phil - no sorry, wasn't intending for you to build it! Just wanted to get your thoughts on usability of your current code for the AR model (if it makes sense).
If it isn't too much trouble, could you tag me/point me to the discussion on the viability of the model?
@siddk yea, i think this repository is a good starting point - however, be aware that in the paper, they had to do some special masking regularization in the cross attention blocks (on the attn logits with -inf rather than post-softmax dropout) to get it to work well
the discussion happened in Eleuther's #research channel. one of the critics summarize it best as "this is just encoder / decoder architecture with encoder of 0 layers"
Thanks so much! Found the discussion... makes a good point, maybe I'm overthinking things 😅. Thanks so much for the kind response @lucidrains!
@siddk sounds good, good luck with your studies / research :)
Hey @lucidrains - love this repo, and still trying to wrap my head around the various difference between Perceiver architectures; how hard would it be to extend PerceiverIO to PerceiverAR; what fundamentally needs to change?