lucidrains / x-transformers

A simple but complete full-attention transformer with a set of promising experimental features from various papers
MIT License
4.42k stars 377 forks source link

Fix rotary embeddings when mems != None #224

Closed pfeatherstone closed 6 months ago

pfeatherstone commented 6 months ago

It's now ONNX-exportable too

pfeatherstone commented 6 months ago

I haven't checked all use cases. But my use case is fixed. Hopefully, I haven't broken other people's code.

pfeatherstone commented 6 months ago

@lucidrains Ready for review

lucidrains commented 6 months ago

normally i would be more liberal with accepting PRs, but this one required a bit more strategizing

sorry for ignoring it! A for effort

pfeatherstone commented 6 months ago

Cheers. There is still the issue with where the mems are recorded when return_mems==True. When using pre-layer normalisation, it's an issue

lucidrains commented 6 months ago

@pfeatherstone oh yea! want to open a PR for that one

that one is more likely an instant accept

pfeatherstone commented 6 months ago

Will do this weekend.

lucidrains commented 6 months ago

actually, that is tricky too

i'll just take care of it

lucidrains commented 6 months ago

@pfeatherstone ok its taken care of https://github.com/lucidrains/x-transformers/commit/49b196e8a9da707c9bf16a59f9d09ed6200dc0e7

pfeatherstone commented 6 months ago

Cheers. The way I did it was to record mems inside the Attention layer. Basically directly before prepending old mems I save new mems.

lucidrains commented 6 months ago

@pfeatherstone yea i thought you did it that way, which is why i just went ahead and did it, knowing i'd probably waste your time again. thanks for uncovering the issue anyhow!