If a user tries to compile two kernels of the same shape with different configurations, e.g. different group_size, in a row, the second run may wrongly reuse the results of the first run.
This root cause of this problem is that the kernel name, which serves as the key to search for compiled kernels, does NOT include all necessary configurations. Currently, the kernel name contains: num_thread, dtype, m, k, n, bits. We should add some more to better distinguish different kernels.
Another related argument is "--reuse_tuned". This should takes effect but not fully.
If a user tries to compile two kernels of the same shape with different configurations, e.g. different group_size, in a row, the second run may wrongly reuse the results of the first run.
This root cause of this problem is that the kernel name, which serves as the key to search for compiled kernels, does NOT include all necessary configurations. Currently, the kernel name contains: num_thread, dtype, m, k, n, bits. We should add some more to better distinguish different kernels.
Another related argument is "--reuse_tuned". This should takes effect but not fully.