pytorch / torchchat

Run PyTorch LLMs locally on servers, desktop and mobile
BSD 3-Clause "New" or "Revised" License
3.38k stars 221 forks source link

Slimming down torchchat: Replace replace_attention_with_custom_sdpa_attention() with ET's implementation #1058

Open Jack-Khuu opened 2 months ago

Jack-Khuu commented 2 months ago

🚀 The feature, motivation and pitch

First surfaced in https://github.com/pytorch/torchchat/pull/1057, the replace_attention_with_custom_sdpa_attention function, used when exporting models in torchchat, can be replaced with the equivalent API provided in the Excecutorch https://github.com/pytorch/executorch/blob/main/examples/models/llama2/source_transformation/sdpa.py

Task: Swap the torchchat implementation with that of ExecuTorch's. Delete the then defunct code from torchchat

Alternatives

No response

Additional context

No response

RFC (Optional)

No response

mikekgfb commented 1 week ago

I think #1057 resolved this. Can we close?

Jack-Khuu commented 1 week ago

Not quite, #1057 was the Pr the flagged it

Should be easy PR, just needs testing