huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
135.21k stars 27.06k forks source link

Loading bigger models is very slow using `AutoModelForCausalLM.from_pretrained` #34798

Open vibhas-singh opened 1 day ago

vibhas-singh commented 1 day ago

System Info

Who can help?

@ArthurZucker @SunMarc

Information

Tasks

Reproduction

I am spawning an g5.12xlarge GPU machine on AWS sagemaker and I am loading a locally saved model using this script:

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0,1,2,3"

import torch
from transformers import AutoModelForCausalLM, AutoProcessor

model_id_or_path = "<local_path>"

model = AutoModelForCausalLM.from_pretrained(model_id_or_path, device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)

This is the problem with almost all the models I am trying - rhymes-ai/Aria can be used to reproduce it.

Expected behavior

The last line takes forever to load the model (>40-50 mins). I have observed the same behaviour for multiple other models as well.

Things I have tried/observed:

Rocketknight1 commented 17 hours ago

This feels like an accelerate issue, so pinging @sunmarc and @muellerzr once again, but yell if I should ping someone else!