Closed pfeatherstone closed 10 months ago
@pfeatherstone yea i see
the mems are typically used in transformer-xl like recurrence with causal mask, and doesn't require the key padding mask you are passing in
however, i fixed it just to make it complete
Repro:
if you set
mems=None
instead, it works