Closed SmerkyG closed 1 week ago
Hi, which commit are you using?
We've recently fixed a bug that raised this error. The most recent one should be good
Thanks, I'll update and see! There is probably still a regression from earlier bug https://github.com/sustcsonglin/flash-linear-attention/issues/58 where H100 will die on num_warps=8 but I will let you know once I try it!
Works great! Sorry for the false alarm. So far no problem on H100 - I'll reopen an issue if any of the kernels fail on that. Thanks!!!
Describe the bug
Hi,
Sorry to bother you, I recently updated FLA on an 8xH100 machine and it now gives new errors during autotuning with fla.ops.simple_gla.chunk that were not present previously. It must be the result of a fairly recent change, though unfortunately I don't know exactly which commit was the one that worked previously. fla.ops.gla.fused_chunk continues to work fine for me, but fla.ops.gla.chunk gives me the same kind of autotune index error on a different machine (8x4090).
It appears to be some autotune setting issue, maybe relating to there being keys supplied but no related args...
I also believe once this is fixed there is a regression that occurred from the fix for bug https://github.com/sustcsonglin/flash-linear-attention/issues/58, since I now see num_warps=8 present in the new code
This is the error I'm seeing:
Steps to reproduce the bug
I am using the following calling code:
Expected behavior
no error :)
Environment info