🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
- `transformers` version: 4.35.0.dev0
- Platform: Linux-5.4.0-153-generic-x86_64-with-glibc2.35
- Python version: 3.10.6
- Huggingface_hub version: 0.17.3
- Safetensors version: 0.4.0
- Accelerate version: 0.25.0.dev0
- Accelerate config: not found
- PyTorch version (GPU?): 2.0.1+cu118 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: Yes A6000
- Using distributed or parallel set-up in script?: Only one GPU, so shouldn't be relevant, but somehow the model is getting loaded to cpu at least in part.
Information
[ ] The official example scripts
[X] My own modified scripts
Tasks
[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)
Reproduction
import torch
from transformers import AutoModelForCausalLM
model_id = "tiiuae/falcon-7b"
model = AutoModelForCausalLM.from_pretrained(
model_id,
load_in_4bit=True,
torch_dtype=torch.bfloat16,
)
for n, p in model.named_parameters():
if p.device.type == "meta":
print(f"{n} is on meta!")
Leads to this error message:
WARNING:root:Some parameters are on the meta device device because they were offloaded to the .
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu/disk.
even though there are no params on the meta device.
Expected behavior
Expect loading to the single A6000 GPU and not use the CPU
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Leads to this error message:
even though there are no params on the meta device.
Expected behavior
Expect loading to the single A6000 GPU and not use the CPU