Closed TaperChipmunk32 closed 3 days ago
With silnlp's transformers version being updated to 4.46, SDPA can now be used for NLLB models.
A config option could be added to specify which attention implementation to use between the following: "eager", "sdpa", and "flash_attention_2".
This is a follow up to this previous issue.
With silnlp's transformers version being updated to 4.46, SDPA can now be used for NLLB models.
A config option could be added to specify which attention implementation to use between the following: "eager", "sdpa", and "flash_attention_2".
This is a follow up to this previous issue.