pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
BSD 3-Clause "New" or "Revised" License
5.58k stars 508 forks source link

How to cache the compilation result? #43

Open huntzhan opened 9 months ago

huntzhan commented 9 months ago

torch.compile always re-compiles a function from scratch in a new Python session, which takes a lot of time. I'm wondering if there's a way to cache the compilation result in the file system (like gcc/clang) to speed up the development & debugging process. @Chillee

https://github.com/pytorch-labs/gpt-fast/blob/db7b273ab86b75358bd3b014f1f022a19aba4797/generate.py#L16-L18

Chillee commented 9 months ago

This is currently an issue we're aware of, unfortunately. In theory, it's possible to use AOTInductor https://www.youtube.com/watch?v=w7d4oWzwZ0c to completely AOT compile everything, however it's somewhat finicky to use.

We also have some plans to offer an easier way to cache compilation results.

To be clear, a number of components should already be cached on recompile - triton autotuning decisions, inductor compilation, etc. It typically takes me on the order of 30-40 seconds for a warm recompile, although we should certainly try to drive this down even further.

huntzhan commented 9 months ago

thanks for reply.