Open rayleizhu opened 1 year ago
Hi, I'm writing my operator using fused attention as a template. However, I found that fused attention requires an Ampere arch:
https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L200
I do not understand this.
Besides, it seems that only head_dim=64 is supported, right? How can I fix it for the head_dim=32 case?
https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L207
there is some more information in https://github.com/openai/triton/issues/616
Hi, I'm writing my operator using fused attention as a template. However, I found that fused attention requires an Ampere arch:
https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L200
I do not understand this.
Besides, it seems that only head_dim=64 is supported, right? How can I fix it for the head_dim=32 case?
https://github.com/openai/triton/blob/d376020f90002757eea3ea9475d4f7cfc2ec5ead/python/triton/ops/flash_attention.py#L207