Closed pfeatherstone closed 6 months ago
I haven't checked all use cases. But my use case is fixed. Hopefully, I haven't broken other people's code.
@lucidrains Ready for review
normally i would be more liberal with accepting PRs, but this one required a bit more strategizing
sorry for ignoring it! A for effort
Cheers. There is still the issue with where the mems are recorded when return_mems==True. When using pre-layer normalisation, it's an issue
@pfeatherstone oh yea! want to open a PR for that one
that one is more likely an instant accept
Will do this weekend.
actually, that is tricky too
i'll just take care of it
@pfeatherstone ok its taken care of https://github.com/lucidrains/x-transformers/commit/49b196e8a9da707c9bf16a59f9d09ed6200dc0e7
Cheers. The way I did it was to record mems inside the Attention layer. Basically directly before prepending old mems I save new mems.
@pfeatherstone yea i thought you did it that way, which is why i just went ahead and did it, knowing i'd probably waste your time again. thanks for uncovering the issue anyhow!
It's now ONNX-exportable too