lucidrains / perceiver-ar-pytorch

Implementation of Perceiver AR, Deepmind's new long-context attention network based on Perceiver architecture, in Pytorch
MIT License
86 stars 4 forks source link

Way to relax constrain for prefix length #2

Open inspirit opened 2 years ago

inspirit commented 2 years ago

I think we can get away with having learnable null-latents in case we dont have prefix initially

lucidrains commented 2 years ago

@inspirit but you need at least one token that outputs a logit

lucidrains commented 2 years ago

yeah, or maybe the other way is to randomly curtail the prefix during training, in which case it will generalize on being conditioned from 0 prefix length to the maximum

lucidrains commented 2 years ago

feel like the paper should have addressed this, especially if book-level autoregressive generation is the goal here

inspirit commented 2 years ago

i think having prefix and query dynamically sized is the best for robustness as well as inference usage

lucidrains commented 2 years ago

@inspirit yeah, maybe i'll just have to push this responsibility to the dataloading

ArEnSc commented 1 year ago

@lucidrains I searched in the paper I don't see a prefix length mentioned. I am confused about this prefix length issue, wouldn't you want the prefix length to be the full size of the context window? ( I guess you guys meant the length of the latents )