microsoft / T-MAC

Low-bit LLM inference on CPU with lookup table
MIT License
588 stars 44 forks source link

History tune.log may bypass kernel generation configurations. #66

Open QingtaoLi1 opened 1 month ago

QingtaoLi1 commented 1 month ago

If a user tries to compile two kernels of the same shape with different configurations, e.g. different group_size, in a row, the second run may wrongly reuse the results of the first run.

This root cause of this problem is that the kernel name, which serves as the key to search for compiled kernels, does NOT include all necessary configurations. Currently, the kernel name contains: num_thread, dtype, m, k, n, bits. We should add some more to better distinguish different kernels.

Another related argument is "--reuse_tuned". This should takes effect but not fully.