Open jeromeku opened 8 months ago
Hi Jerome,
Nice to hear from you. This seems like a very interesting project. It is a bit beyond my area of expertise as I have only dabbled a bit in the internals of MLIR generation and lowering. My only worry is that this area does seem like it is evolving quite quickly so there may not yet be stable enough foundations to document for lay-users.
@Jokeren might have more to add.
I doubt the Triton developers will find the time to craft documentation or develop tutorials.
However, there's a bunch of Chinese users out there who've been diving deep into Triton's code, breaking down every compiler pass in their blogs (in Chinese). Honestly, their enthusiasm has taken me by surprise... I've glanced over a few of these blogs and, honestly, they're top-notch.
@Jokeren: Which blogs are you referring to? I'd certainly be interested in taking a gander!
@srush: no worries -- let me see how far I can take this on my own and will be happy to share any progress.
Wait, do you read Chinese? I meant they are written in Chinese, or you plan to translate?
Plan to translate -- wonders of multilingual llms (or google translate for that matter). I am also Chinese.
That's great. Feel free to search with keyword "triton" on zhihu.com :)
That's what I'm referring to
@Jokeren
Thanks -- yes, I've come across many of those already. Many deep dives into all things CUDA / Cutlass there as well, haha.
Are you planning on teaching a course on MLIR
/ deep learning compilers at your university?
Are you planning on teaching a course on MLIR / deep learning compilers at your university?
Maybe not. They didn't assign me to a compiler course unfortunately.
@srush
Always appreciate your wonderful OSS educational contributions!
I'm relatively familiar with
CUDA
andtriton
but less so with machine learning compilers and am interested in getting into the weeds oftriton
's compilation pipeline.I've come across a few resources for learning
MLIR
as well as related projects such asTVM
(which has a comprehensive set of tutorials / learning materials spearheaded by Tianqi Chen of CMU), but have yet to bridge the gap from basicMLIR
to something on the scale oftriton
.The overarching motivation -- other than the fact that ML compilers are super-interesting :) -- is that in a world of increased demand for ML training / inference but limited GPU (NVIDIA) supply, the ability to write code that is backend-agnostic is evermore important.
A few questions:
MLIR
incrementally, ideally building from basics to something like atoy
triton
, and more ambitiously, understanding enough of thetriton
backend to be able to contribute new optimization passes?I'd be willing to do as much of the heavy lifting as needed:
triton
tutorials, starting withvec-add
.C++
MLIR
pipeline and provides greater visibility -- and hackability -- than simply observing the output ofMLIR_ENABLE_DUMP
.cc @Jokeren