Error indicating Model loading on meta device

System Info

- `transformers` version: 4.35.0.dev0
- Platform: Linux-5.4.0-153-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.4.0
- Accelerate version: 0.25.0.dev0
- Accelerate config:    not found
- PyTorch version (GPU?): 2.0.1+cu118 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes A6000
- Using distributed or parallel set-up in script?: Only one GPU, so shouldn't be relevant, but somehow the model is getting loaded to cpu at least in part.

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

import torch
from transformers import AutoModelForCausalLM

model_id  = "tiiuae/falcon-7b"

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_4bit=True,
    torch_dtype=torch.bfloat16,
)

for n, p in model.named_parameters():
    if p.device.type == "meta":
        print(f"{n} is on meta!")

Leads to this error message:

WARNING:root:Some parameters are on the meta device device because they were offloaded to the .
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu/disk.

even though there are no params on the meta device.

Expected behavior

Expect loading to the single A6000 GPU and not use the CPU

huggingface / accelerate

Error indicating Model loading on meta device #2103

System Info

Information

Tasks

Reproduction

Expected behavior