Open amorehead opened 2 years ago
@amorehead hey Alex! the GAU module could be made to support cross attention, but not the FLASH transformer. The FLASH transformer design is very specific for autoregressive training
Hi @lucidrains! Would you mean i can just imply GAU on cross-attention model such as t5? I foud GAU works very well on bert model
Hi, @lucidrains. Thank you for sharing this excellent implementation with us all! Do you have any thoughts as to what changes would need to be made to make cross-attention possible with your
FLASH
model?