Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that.

mosaicml / llm-foundry

LLM training code for Databricks foundation models

https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm

Apache License 2.0

3.99k stars 525 forks source link

Currently multi-gpu generate does not work with hf.generate for hf checkpoints. This PR fixes that. #1332

Closed ShashankMosaicML closed 3 months ago

ShashankMosaicML commented 3 months ago

Multi gpu generation using hf.generate with device map = 'auto' does pipeline parallelism and moves different modules to different gpus. This results in input tensors to certain operations being on different gpus than other inputs to that operation, which results in an error. This PR moves the tensors to match the other tensors. This should not slow down training because during training all of these tensor movements should be no-ops.