microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
https://www.deepspeed.ai/
Apache License 2.0
35.35k stars 4.1k forks source link

Dont_change_device for parameters in initialization #2949

Open larry-fuy opened 1 year ago

larry-fuy commented 1 year ago

When I was running model training by Zero offload, to save the GPU memory I make the model weights initialized on CPU memory too by setting up deepspeed.zero.Init(remote_device="cpu", dtype=torch.half, enabled=False). Although the model weight is really initialized on CPU memory, but after deepspeed.initialzed(), the model still move to GPU memory. So I am wondering 1) in Zero offload (stage 3, offload to cpu/nvme) is it possible the weight of the model stay mainly on CPU memory/Nvme but only is loaded layer by layer to GPU memory? 2) I found in engine.py (actually it is called by deepspeed.initialize()) there is an argument dont_change_device (link)[https://github.com/microsoft/DeepSpeed/blob/4ae3a3da0dfd19d7ab7a76e7c742ac12f44fc1c0/deepspeed/runtime/engine.py#L1138-L1139] which controls whether or not the model weights is moved to GPU memory. But I also found no place to call dont_change_device. So my question is how to use dont_change_device and is it used to retain the model weight on CPU memory?

tjruwase commented 1 year ago

@larry-fuy, to enable zero stage 3 offloading to cpu/nvme, enabled must be True in deepspeed.zero.Init(). Please see this tutorial for using this feature (a.k.a., zero-infinity). Here are answers to your specific questions:

  1. Streaming layer weights into GPU from CPU/NVMe, on-demand, as you have described is one of the features of zero-infinity. You can configure the "offload_params" in the ds_config to control this behavior.
  2. You should not need to manipulate dont_change_device. We can revisit this if the above suggestions don't work for you.

Thanks!

cerisara commented 1 year ago

I'd like to set this option "dont_change_device" because deepspeed is calling the module.to() which raises an error in transformers with my 4-bits model:

ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the
model as it is, since the model has already been set to the correct devices and 
casted to the correct `dtype`.
tripathiarpan20 commented 2 months ago

I'd like to set this option "dont_change_device" because deepspeed is calling the module.to() which raises an error in transformers with my 4-bits model:

ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the
model as it is, since the model has already been set to the correct devices and 
casted to the correct `dtype`.

Same issue for BitsAndBytes Llama3.1 70B Instruct 4bit model