async_compile.triton is unable to cache generated triton kernels due to meta containing kernel names

oulgen commented 11 months ago

🐛 Describe the bug

When inductor generates a kernel, it emits inside the async_compile.triton(...). The code inside this block is cached across different graphs. However, a recent change introduced 'kernel_name' field inside the meta in order to identify kernels. This field makes it so that the caching can no longer work as each blob is now unique. Example:

@pointwise(
    size_hints=[8],
    filename=__file__,
    meta={'signature': {0: '*fp32', 1: '*fp32', 2: 'i32'}, 'device': 0, 'device_type': 'cuda', 'constants': {}, 'mutated_arg_names': ['out_ptr0'], 'autotune_hints': set(), 'kernel_name': 'triton_poi_fused_3', 'configs': [instance_descriptor(divi
sible_by_16=(0, 1), equal_to_1=(), ids_of_folded_args=(), divisible_by_8=())]},
    min_elem_per_thread=0
)

This does not affect caching for kernels in same graph as inductor has another layer of caching.

Two possible solutions @davidberard98 and I discussed are 1) emitting two wrappers in codegen one for profiling and one for regular execution where the profiling one records the name and 2) emitting a wrapper around cached async compile.

Versions

origin/main

cc @ezyang @anijain2305 @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @msaroufim @wconstab @bdhirsh @zou3519 @yf225 @aakhundov

oulgen commented 11 months ago

cc: @jansel @Chillee since we chatted about this

zhxchen17 commented 3 months ago

Hi @davidberard98 is there any updates on this issue? Has this been resolved or do we want to take action on it? Thanks.

davidberard98 commented 3 months ago

cc @oulgen what's the priority on this? I don't have concrete plans to work on this in the short term, but now with caching being enabled is it something that is higher priority now?

oulgen commented 3 months ago

@davidberard98 @zhxchen17 This is still good to fix but I dont think it is actively blocking anything

pytorch / pytorch

async_compile.triton is unable to cache generated triton kernels due to meta containing kernel names #112788

🐛 Describe the bug

Versions