Open lucidrains opened 2 months ago
It could be this one https://arxiv.org/pdf/2405.20324 (Nicolas Dufour et. al, CVPR 2024) which has extended the RIN into text condition.
@StevenLiuWen very cool! and not the original author(s)!
@StevenLiuWen very cool! and not the original author(s)!
Also, another work, PointInfinity (https://arxiv.org/pdf/2404.03566) applied it to the 3D point cloud generation. RIN or perceiver-io style architecture has a nice property for handling high-resolution data. Looking forward to their more potential applications.
indeed, thank you!
This paper is a direct extension from one of the authors (Ting Chen):
FIT: Far-reaching Interleaved Transformers Ting Chen, Lala Li https://arxiv.org/abs/2305.12689
Only skimmed it, but it looks like they just add local self-attention layers to the data branch of RIN. A bit hard to interpret their diffusion results because they only report MSE. It seems reasonable that local self-attention over the pixels would help though.
@justinlovelace that's an interesting paper too! 🙏
This paper is a direct extension from one of the authors (Ting Chen):
FIT: Far-reaching Interleaved Transformers Ting Chen, Lala Li https://arxiv.org/abs/2305.12689
Only skimmed it, but it looks like they just add local self-attention layers to the data branch of RIN. A bit hard to interpret their diffusion results because they only report MSE. It seems reasonable that local self-attention over the pixels would help though.
probably using a NATTEN on the data branch would work even better
Has anyone tried this purely on text? I am currently working to adapt it for text, so it would vaguely be a "diffusion language model" but wanted to know if there are any similar works or negative results from folks who have tried it already (cc: @justinlovelace would be interested in your thoughts/opinions).
recently ran into a researcher who told me there is a follow up paper to this work
does anyone know of it?