ValueError: .to is not supported for 4-bit or 8-bit models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct dtype

robinsonmhj commented 7 months ago

System Info

transformers==4.31.0 accelerate==0.21.0 deepspeed==0.13.2 bitsandbytes==0.42.0

Who can help?

No response

Information

[X] The official example scripts
[X] My own modified scripts

Tasks

[X] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[ ] My own task or dataset (give details below)

Reproduction

import deepspeed
import torch
from datasets import load_dataset
from deepspeed.accelerator import get_accelerator
from torchmetrics.classification import BinaryAccuracy, BinaryF1Score
from transformers import AutoModelForSequenceClassification, AutoTokenizer, set_seed, BitsAndBytesConfig

model_name = 'llm-models/Llama-2-7b-chat-hf'
bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

model = AutoModelForSequenceClassification.from_pretrained(
    model_name, 
    return_dict=True, 
    quantization_config=bnb_config
)

deepspeed_config = {
    "optimizer": {
          "type": "AdamW",
            "params": {
                "lr": "auto",
                "betas": "auto",
                "eps": "auto",
            },
    },
    "scheduler": {"type": "WarmupLR", "params": {"warmup_num_steps": 100}},
    "fp16": {"enabled": False},
    "bf16": {"enabled": True},  # Turn this on if using AMPERE GPUs.
    "zero_optimization": {
        "stage": 3,
        "offload_optimizer": {
            "device": "cpu",
            "pin_memory": True
        },
        "stage3_gather_16bit_weights_on_model_save": True
    },
    "gradient_accumulation_steps": 1,
    "gradient_clipping": True,
    "steps_per_print": 10,
    "train_micro_batch_size_per_gpu": 16,
    "wall_clock_breakdown": False,
    "overlap_comm": True,
    "contiguous_gradients": True,
}
model, optimizer, _, lr_scheduler = deepspeed.initialize(
          model=model,
         model_parameters=model.parameters(),
        config=deepspeed_config,
)

Expected behavior

expected no error and exception, however, get the following error

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[3], line 39
      1 deepspeed_config = {
      2     "optimizer": {
      3           "type": "AdamW",
   (...)
     37     "contiguous_gradients": True,
     38 }
---> 39 model, optimizer, _, lr_scheduler = deepspeed.initialize(
     40 model=model,
     41 model_parameters=model.parameters(),
     42 config=deepspeed_config,
     43 )

File /opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/__init__.py:177, in initialize(args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config, config_params)
    165         engine = DeepSpeedHybridEngine(args=args,
    166                                        model=model,
    167                                        optimizer=optimizer,
   (...)
    174                                        config=config,
    175                                        config_class=config_class)
    176     else:
--> 177         engine = DeepSpeedEngine(args=args,
    178                                  model=model,
    179                                  optimizer=optimizer,
    180                                  model_parameters=model_parameters,
    181                                  training_data=training_data,
    182                                  lr_scheduler=lr_scheduler,
    183                                  mpu=mpu,
    184                                  dist_init_required=dist_init_required,
    185                                  collate_fn=collate_fn,
    186                                  config=config,
    187                                  config_class=config_class)
    188 else:
    189     assert mpu is None, "mpu must be None with pipeline parallelism"

File /opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py:262, in DeepSpeedEngine.__init__(self, args, model, optimizer, model_parameters, training_data, lr_scheduler, mpu, dist_init_required, collate_fn, config, config_class, dont_change_device)
    259 self.pipeline_parallelism = isinstance(model, PipelineModule)
    261 # Configure distributed model
--> 262 self._configure_distributed_model(model)
    264 # needed for zero_to_fp32 weights reconstruction to remap nameless data to state_dict
    265 self.param_names = {param: name for name, param in model.named_parameters()}

File /opt/conda/envs/domino-ray/lib/python3.10/site-packages/deepspeed/runtime/engine.py:1113, in DeepSpeedEngine._configure_distributed_model(self, model)
   1111 # zero.Init() handles device placement of model
   1112 if not (self.dont_change_device or is_zero_init_model):
-> 1113     self.module.to(self.device)
   1115 # MoE related initialization
   1116 for _, module in self.module.named_modules():

File /opt/conda/envs/domino-ray/lib/python3.10/site-packages/transformers/modeling_utils.py:1895, in PreTrainedModel.to(self, *args, **kwargs)
   1892 def to(self, *args, **kwargs):
   1893     # Checks if the model has been loaded in 8-bit
   1894     if getattr(self, "is_quantized", False):
-> 1895         raise ValueError(
   1896             "`.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the"
   1897             " model has already been set to the correct devices and casted to the correct `dtype`."
   1898         )
   1899     else:
   1900         return super().to(*args, **kwargs)

ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

ArthurZucker commented 6 months ago

cc @SunMarc

SunMarc commented 6 months ago

Hi @robinsonmhj, bnb only is not compatible with deepspeeed. In fact, you can't train a quantized model in general. However, it works with peft + bnb + deepspeed (stage 1 and 2). For more detail, please check the following PR.

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

cdoern commented 4 months ago

Hi @robinsonmhj, bnb only is not compatible with deepspeeed. In fact, you can't train a quantized model in general. However, it works with peft + bnb + deepspeed (stage 1 and 2). For more detail, please check the following https://github.com/huggingface/peft/pull/1529.

@SunMarc sorry to bring up a dead issue here. By this comment (and the attached PR) is it not possible to load a pretrained model using 4bq from bnb, then kick off training using deepspeed?

I keep getting ValueError: .to is not supported for 4-bit or 8-bit bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correcttype`

I am trying to

load pretrained model using bnb since it cannot fit on my GPU otherwise
model.to(dev) where dev is a single GPU
set up training args with deepspeed
deepspeed.initialize. the deepspeed initialization fails with the model. If there is a way to make this work, that would be super helpful.

SunMarc commented 4 months ago

Hi @cdoern, it is not possible to perform full finetuning when you are loading a model with bnb. However, if you perform a funetuning using peft, this should be compatible. Check out this doc on how peft + bnb + deepspeed works !

huggingface / transformers