philipturner / metal-flash-attention

FlashAttention (Metal Port)
MIT License
381 stars 19 forks source link

[Question] Why use index 50000 instead of 101? #9

Closed FdyCN closed 11 months ago

FdyCN commented 11 months ago

I found that constant index in Attention.metal is: 100->50000->102->103 https://github.com/philipturner/metal-flash-attention/blob/32592c98eff18001d4eec2c7a204e288fa92fa44/Sources/Attention.metal#L39

// why not using this ???
constant bool masked [[function_constant(101)]];
philipturner commented 11 months ago

Because there's a compiler bug that causes runtime crashes. The only workaround is to reassign index 50000, so it doesn't conflict with the index 101 in the other GPU kernel.

FdyCN commented 11 months ago

Because there's a compiler bug that causes runtime crashes. The only workaround is to reassign index 50000, so it doesn't conflict with the index 101 in the other GPU kernel.

@philipturner thanks for your reply, which means, there is some other GPU kernel from systerm or Xcode SDK who is using index 101?

philipturner commented 11 months ago

The other GPU kernel in MFA is using 101. It isn't supposed to cause issues, but with a specific version of the Xcode SDK that built this, there was an issue. It was a bug, not intended behavior from the compiler.