Open xinghaow99 opened 2 days ago
Hi @xinghaow99, this should not be possible to load bnb model on cpu for now.
If you run the following, you will get the error. I will fix the missing check :
from transformers import AutoModelForCausalLM, BitsAndBytesConfig from accelerate import Accelerator import torch model = AutoModelForCausalLM.from_pretrained( 'models/Llama-2-7b-hf-2bit-64rank-5iter', # base model obtained by LoftQ, should be equivalent to 'LoftQ/Llama-2-7b-hf-2bit-64rank' torch_dtype=torch.bfloat16, device_map={"":"cpu"}, quantization_config=BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.bfloat16, bnb_4bit_use_double_quant=False, bnb_4bit_quant_type='nf4' ), )
I recommend loading your model directly on gpu by setting device_map = {"":"cuda"}
@SunMarc Hi! Thank you for getting back to me. I'm trying to load submodules to GPUs dynamically(only moving them to GPU when computing) to save GPU RAM since I'm training the model layer by layer. Guess this is not supported for now...
System Info
Information
Tasks
no_trainer
script in theexamples
folder of thetransformers
repo (such asrun_no_trainer_glue.py
)Reproduction
Hi! I want to load my model to cpu initially and use ddp for some sub modules with
accelerator.prepare()
later. Here is an simple reproduction:Got
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
I see that the tensors are sent to cpu by some device alignment hooks added by accelerate. Is this expected? Or any workaround? Thanks for any help!
Expected behavior
Model and tensors should both be on cuda.