huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
133.25k stars 26.61k forks source link

Model loading on meta device #27183

Closed RonanKMcGovern closed 11 months ago

RonanKMcGovern commented 11 months ago

System Info

A6000 GPU on runpod.

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

Who can help?

@ArthurZucker @younesbelkada

Information

Tasks

Reproduction

!pip install -U -q git+https://github.com/huggingface/transformers.git

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q datasets
!pip install -q -U scipy
!pip install -U flash-attn -q
!pip install -q -U trl

from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig, AutoConfig
import torch

model_id  = "tiiuae/falcon-7b"

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

config = AutoConfig.from_pretrained(model_id)
config.max_position_embeddings = 4096 # (input + output) tokens can now be up to 4096

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    config=config,
    quantization_config=bnb_config,
    # rope_scaling={"type": "linear", "factor": 2.0},
    device_map='auto',
    # trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    use_flash_attention_2=True, # works with Llama models and reduces memory reqs
    cache_dir=cache_dir)

Expected behavior

I would expect this model to easily fit on an A6000 with 48GB of VRAM.

Instead, I get this error/notification:

WARNING:root:Some parameters are on the meta device device because they were offloaded to the .
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu/disk.
younesbelkada commented 11 months ago

hi @RonanKMcGovern thanks for your issue I ran:

import torch
from transformers import AutoModelForCausalLM

model_id  = "tiiuae/falcon-7b"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
)

for n, p in model.named_parameters():
    if p.device.type == "meta":
        print(f"{n} is on meta!")

and I can confirm I had no parameter on the meta device while having the same error message you shared. Perhaps it is a bug at accelerate. Can you file an issue there and use this small handy snippet?

RonanKMcGovern commented 11 months ago

done, thanks: https://github.com/huggingface/accelerate/issues/2103