tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥
https://www.modular.com/blog/community-spotlight-how-i-built-llama2-by-aydyn-tairov
MIT License
2.09k stars 139 forks source link

Autotune for matmul nelts #72

Closed andresnowak closed 4 months ago

andresnowak commented 10 months ago

This is a proof of concept, because there are some problems with the implementation.

  1. For now you can't use autotune search with a function that has compile_time parameters, if you try to do it it just fails at compilation, probably, because the function expects to give a a compile time function and if it has compile time parameters it wants to give it with a specific compile-time value. So we would need to have three autotunes, one for each batch_matmul(3 tensors, 2 tensors and 1 tensor)
  2. You can't use always_inline in an adaptive function, if you do it, it gives a compile-time error.
  3. The autotune also gives an error at compile-time if you use in the parallelize function a global var for the workers.
  4. And lastly, this problem i don't know why is happening, is that when the program is running the search, running my batch_matmul_evaluator, after finishing the program it gives a segmentation fault. In this part maybe I'm doing something wrong.

So I'm doing this pull request see your opinion on the problem, if you would like this to be implemented as it is now, or would you like to wait, or if you have an idea for the problem.

Another idea would be to instead make the transformer function the adaptive function, but doing this would require for the search autotune part to be able to read the stories15m.bin, and i don't know if for each function (like matmul and rope_llama), different nelts could give different results (I would think that this isn't the case), and i don't know if it would work, it seems that the search function can fail with various things when trying to do the expansions.

andresnowak commented 10 months ago

Now in 0.5, fixing everything needed for 0.5 doesn't even compile

tairov commented 10 months ago

Cool! Those are reasons to contribute more to open source 😀

tairov commented 4 months ago

Closing this pr for now, since autotune is removed from the latest mojo releases, probably it will be back in future