sillsdev / silnlp

A set of pipelines for performing experiments on various NLP tasks with a focus on resource-poor/minority languages.
Other
34 stars 3 forks source link

Add support for SDPA to NLLB in Huggingface Transformers #478

Closed ddaspit closed 1 month ago

ddaspit commented 3 months ago

NLLB currently supports FlashAttention in HF Transformers. Unfortunately, FlashAttention results in degradation of quality, because it does not properly support padding masks. SDPA provides an alternative route for applying attention optimizations. Under the hood, it supports FlashAttention and Memory Efficient Attention. Memory Efficient Attention should support masking. Here is the issue for adding SDPA support to models in Transformers. For a list of currently supported models, check out the Transformers documentation. A good example to follow would be BART, which has a full encoder-decoder architecture. It might also be useful to check out this PR that adds SDPA support to T5, another encoder-decoder LLM.

isaac091 commented 1 month ago

PR submitted to transformers library last week, waiting on review.

ddaspit commented 1 month ago

Here is the PR: https://github.com/huggingface/transformers/pull/33309

isaac091 commented 1 month ago

Merged!

ddaspit commented 1 month ago

That is awesome. Good job.