Open amrohendawi opened 10 months ago
This can be done by swapping out the inner_mha_cls
in AttentionBlock forward call, using any other generic PyTorch implementation of attention.
This can be done by swapping out the
inner_mha_cls
in AttentionBlock forward call, using any other generic PyTorch implementation of attention.
@Zymrael Can you elaborate more? seems like I need to switch in more then one place
In order to locally develop applications with this great model it is necessary to support it on apple silicon M1/M2/M3 chips.
The main missing requirement is
flash_attn