Slimming down torchchat: Replace replace_attention_with_custom_sdpa_attention() with ET's implementation

Jack-Khuu commented 2 months ago

First surfaced in https://github.com/pytorch/torchchat/pull/1057, the replace_attention_with_custom_sdpa_attention function, used when exporting models in torchchat, can be replaced with the equivalent API provided in the Excecutorch https://github.com/pytorch/executorch/blob/main/examples/models/llama2/source_transformation/sdpa.py

Task: Swap the torchchat implementation with that of ExecuTorch's. Delete the then defunct code from torchchat

No response

No response

No response

mikekgfb commented 1 week ago

I think #1057 resolved this. Can we close?

Jack-Khuu commented 1 week ago

Not quite, #1057 was the Pr the flagged it

Should be easy PR, just needs testing

pytorch / torchchat