lucidrains / x-transformers

A concise but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.85k stars 419 forks source link

Small paper ideas to be added #262

Open RyanKim17920 opened 4 months ago

RyanKim17920 commented 4 months ago

Here's some papers I've read that would be nice to have, I'll try to implement them if I can:

https://arxiv.org/pdf/2010.04245

https://arxiv.org/abs/2210.05144 (Probably should add FFN MoE as well)

https://arxiv.org/pdf/2404.02258 (Probably will be hard to make work with other features)

lucidrains commented 4 months ago

@RyanKim17920 so the first paper is already in the repository and even cited

i do like the second paper, and can try it out before adding it

the third paper, i like as well, but may be outside the scope of this repo

lucidrains commented 4 months ago

@RyanKim17920 someone also shared with me https://arxiv.org/abs/2312.07987 which could be an improvement from MoA

lucidrains commented 4 months ago

@RyanKim17920 the switchhead paper is pretty good

will run the experiments tomorrow morning, and if all goes well, it will probably in the repository by week's end

Baran-phys commented 1 month ago

@lucidrains What do you think of https://www.arxiv.org/abs/2408.14915, in particular the DRA activation function for Continuous Transformers?

Baran-phys commented 1 month ago

@lucidrains If you confirm, I can also open a PR for DRA.

lucidrains commented 1 month ago

@Baran-phys hey Baran, thanks for sharing your paper.

it is interesting but i will probably not accept as it is not relevant for this repository. periodic activation functions is something i've been meaning to look into once the right problem presents