modular-ml / wrapyfi-examples_llama

Inference code for facebook LLaMA models with Wrapyfi support
GNU General Public License v3.0
130 stars 14 forks source link

Model Parallel Question #5

Closed sharlec closed 1 year ago

sharlec commented 1 year ago

Did you change the Model Parallel(MP) value for 7B? I think they did tensor parallel and may require to modify the model to match the MP with number GPUs

fabawi commented 1 year ago

MP affects the number of GPUs onto which the model is distributed. Since the distribution is now adjusted according to the number of spawns (torchrun instances) with wrapyfi, you always set MP to 1, regardless of the model size variant chosen. To work with the 13B model variant or larger, you must reshard (linked and described in the readme) the checkpoint first