Closed ayoub-louati closed 1 year ago
Are you trying to use this for handwritten Triton kernels without inductor? If so, why not just use triton.autotune? There is also an AOT compilation option in Triton.
@jansel Yes, it is handwritten triton kernel without inductor, and i'm trying to use this one because as said in the PR it reduces CPU overheads when cudagraphs is disabled and the cache introduced is really interesting because it offers the ability to reuse the compiled kernels from a run to another one. Is it possible or it should be related to inductor ?
This API is internal to inductor and not intended for handwritten kernels. You may be able to adapt it to your needs, but will need to annotate the Triton signature/invariants/metadata manually and will have no backward compatibility guarantees.
Inductor generates the needed metadata here: https://github.com/pytorch/pytorch/blob/d41b5d7c145f3e09c7223c2b707933266241ec9b/torch/_inductor/codegen/triton.py#L1063 which relies on some compiler analysis.
Hello, Please how can we use the new decorator related to the caching_autotune introduced in https://github.com/pytorch/torchdynamo/pull/1338 with a new defined kernel. Here is an example of the kernel’s signature:
I introduced this decorator:
based on this example of test: pytorch/test_torchinductor.py at fae821c2f166fccab6a3c34e293c7268f61e82ba · pytorch/pytorch · GitHub 1
But i thought it might be a better way to use the caching_autotune.
Thanks in advance,