togethercomputer / stripedhyena

Repository for StripedHyena, a state-of-the-art beyond Transformer architecture
Apache License 2.0
299 stars 21 forks source link

Apple Silicon support #5

Open amrohendawi opened 10 months ago

amrohendawi commented 10 months ago

In order to locally develop applications with this great model it is necessary to support it on apple silicon M1/M2/M3 chips.

The main missing requirement is flash_attn

Zymrael commented 10 months ago

This can be done by swapping out the inner_mha_cls in AttentionBlock forward call, using any other generic PyTorch implementation of attention.

OdedKBio commented 8 months ago

This can be done by swapping out the inner_mha_cls in AttentionBlock forward call, using any other generic PyTorch implementation of attention.

@Zymrael Can you elaborate more? seems like I need to switch in more then one place