Break down parallelize_llama for inference cases

pytorch / torchtitan

A native PyTorch Library for large model training

BSD 3-Clause "New" or "Revised" License

2.25k stars 165 forks source link

Closed kwen2501 closed 3 months ago

kwen2501 commented 3 months ago

Stack from ghstack (oldest at bottom):

Breaking up parallelize_llama into:

This is for functionality reuse in inference cases, because one would not need activation checkpointing or DP there.

Can also improve code modularity and readability.