thuml / depyf

depyf is a tool to help you understand and adapt to PyTorch compiler torch.compile.
https://depyf.readthedocs.io
MIT License
426 stars 11 forks source link

[help wanted] Why does `torch.compile` dump each Triton kernel? #46

Closed imShZh closed 4 weeks ago

imShZh commented 1 month ago

All stuff in depyf works fine.

After I ran the example in README with depyf, there are multiple files in target directory.

├── __compiled_fn_1 AFTER POST GRAD 0.py
├── __compiled_fn_1 Captured Graph 0.py
├── __compiled_fn_1 Forward graph 0.py
├── __compiled_fn_1 kernel 0.py
├── __compiled_fn_1 kernel 1.py
├── __compiled_fn_1 kernel 2.py
├── __compiled_fn_5 AFTER POST GRAD 0.py
├── __compiled_fn_5 Captured Graph 0.py
├── __compiled_fn_5 Forward graph 0.py
├── __compiled_fn_5 kernel 0.py
├── __compiled_fn_5 kernel 1.py
├── full_code_for_toy_example_0.py
├── __transformed_code_0_for_torch_dynamo_resume_in_toy_example_at_9.py
└── __transformed_code_0_for_toy_example.py

Why does torch.compile dump __compiled_fn_1 kernel 1.py and __compiled_fn_1 kernel 2.py while dumping __compiled_fn_1 kernel 0.py? Since the latter already contains the string form of the first two Triton kernels?

youkaichao commented 4 weeks ago

thanks for your interest!

these are the intermediate steps of torch.compile. possibly torch.compile generates two kernels first, and then merge then into a single file 🤔