triton-lang / triton

Development repository for the Triton language and compiler
https://triton-lang.org/
MIT License
12.69k stars 1.53k forks source link

Triton Thread Pool #2088

Open eellison opened 1 year ago

eellison commented 1 year ago

Within PyTorch torchinductor we are JIT compiling many triton functions, often 100+. We currently have a mechanism that will initialize a pool of forked processes in order to parallelize triton compilation.

However, this comes with downsides. Each fork in python duplicates all of its parent process' memory, including things like locks. Depending on the parent process, it can be unsafe to fork. We have had difficulty enabling the parallel compilation in long-running production environments.

It would be great to switch to a lighter weight mechanism. I'm proposing to add an api to triton to initialize a C++ thread pool. When we call into C++ for a compilation step we would use the thread pool and release the gil.

If this is acceptable I/PyTorch team can submit the PR.

joker-eph commented 1 year ago

The most efficient may be for having a shared MLIRContext (the types and attributes would be shared) and use the existing thread pool facilities there. Happy to help if you'd like.

eellison commented 1 year ago

yea, help would be great ! waiting for confirmation from triton folks that they'd accept this PR first, I think.

chrish42 commented 3 months ago

This was still causing problems for us, so we ended up adding a workaround to make sure that Triton compilation doesn't run when parallel dataloaders are active, etc. It'd be nice to get this bug fixed. Heard anything back from the Triton folks? Who did you reach out to?