Support for Group Query Attention

At the moment for 'torch==2.6.0.dev20241114+cu124' I get an error ValueError: Expect query and key/value to have the same number of heads but got Hq=32 and Hkv=8. Try setting enable_gqa=True for GQA when I try to it for a setting with different number of kv and q heads.

Do you plan to add support for Grou-Query Attention in the future versions?

*edit: skill issue, it is actually available as a param: enable_gqa

pytorch-labs / attention-gym

Support for Group Query Attention #82