huggingface / accelerate

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
https://huggingface.co/docs/accelerate
Apache License 2.0
7.77k stars 941 forks source link

ValueError: weight is on the meta device, we need a `value` to put in on 0. #2906

Closed nilsjohanbjorck closed 1 month ago

nilsjohanbjorck commented 3 months ago

System Info

ubuntu + 3.10.12

>>> transformers.__version__
'4.42.3'
>>> accelerate.__version__
'0.31.0'

Information

Tasks

Reproduction

I am following the example in https://huggingface.co/docs/accelerate/v0.31.0/en/package_reference/big_modeling#accelerate.load_checkpoint_and_dispatch. The code is

from huggingface_hub import snapshot_download
from accelerate import init_empty_weights, load_checkpoint_and_dispatch
from transformers import AutoModel

model_name = 'CohereForAI/c4ai-command-r-v01'
weights_location = snapshot_download(model_name, cache_dir="./huggingface_mirror")
save_dir = "./huggingface_mirror/models--CohereForAI--c4ai-command-r-v01/snapshots/16881ccde1c68bbc7041280e6a66637bc46bfe88/"

with init_empty_weights():
    model = AutoModel.from_pretrained(model_name)

# model.tie_weights()
model = load_checkpoint_and_dispatch(
    model, save_dir, device_map="auto"
)

I get this error message

Some weights of the model checkpoint at ./huggingface_mirror/models--CohereForAI--c4ai-command-r-v01/snapshots/16881ccde1c68bbc7041280e6a66637bc46bfe88/ were not used when initializing CohereModel: {'model.layers.3.input_layernorm.weight', ......., 'model.layers.11.self_attn.o_proj.weight', 'model.layers.32.mlp.up_proj.weight'}. This may or may not be an issue - make sure that the checkpoint does not have unnecessary parameters, or that the model definition correctly corresponds to the checkpoint.
Traceback (most recent call last):
  File "/data/large_model.py", line 12, in <module>
    model = load_checkpoint_and_dispatch(
  File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 619, in load_checkpoint_and_dispatch
    return dispatch_model(
  File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 419, in dispatch_model
    attach_align_device_hook_on_blocks(
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 648, in attach_align_device_hook_on_blocks
    attach_align_device_hook_on_blocks(
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 608, in attach_align_device_hook_on_blocks
    add_hook_to_module(module, hook)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 157, in add_hook_to_module
    module = hook.init_hook(module)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 275, in init_hook
    set_module_tensor_to_device(module, name, self.execution_device, tied_params_map=self.tied_params_map)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/modeling.py", line 354, in set_module_tensor_to_device
    raise ValueError(f"{tensor_name} is on the meta device, we need a `value` to put in on {device}.")
ValueError: weight is on the meta device, we need a `value` to put in on 0.

Expected behavior

should not crash

SunMarc commented 3 months ago

Hey @nilsjohanbjorck, you can just do the following one liner to load your model :

model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto")

It will do the same thing. I think that the error in your snippet is that you need to use AutoModelForCausalLM instead.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.