xrsrke / pipegoose

Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
MIT License
76 stars 17 forks source link

Lazy initialization of massive models #25

Open xrsrke opened 10 months ago

xrsrke commented 10 months ago

APIs

from pipegoose.utils import lazy_init

# load the model from `transformers`

with lazy_init(parallel_context):
        model = TensorParallel(model, parallel_context).parallelize()
        model = PipelineParallel(model, parallel_context).parallelize()
        model = DataParallel(model, parallel_context).parallelize()

logits = model(inputs)

Reading

createsmit7 commented 10 months ago

Hello, please assign this to me.