Open larry-fuy opened 1 year ago
@larry-fuy, to enable zero stage 3 offloading to cpu/nvme, enabled
must be True
in deepspeed.zero.Init()
. Please see this tutorial for using this feature (a.k.a., zero-infinity). Here are answers to your specific questions:
"offload_params"
in the ds_config to control this behavior. dont_change_device
. We can revisit this if the above suggestions don't work for you. Thanks!
I'd like to set this option "dont_change_device" because deepspeed is calling the module.to() which raises an error in transformers with my 4-bits model:
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the
model as it is, since the model has already been set to the correct devices and
casted to the correct `dtype`.
I'd like to set this option "dont_change_device" because deepspeed is calling the module.to() which raises an error in transformers with my 4-bits model:
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.
Same issue for BitsAndBytes Llama3.1 70B Instruct 4bit model
When I was running model training by Zero offload, to save the GPU memory I make the model weights initialized on CPU memory too by setting up
deepspeed.zero.Init(remote_device="cpu", dtype=torch.half, enabled=False)
. Although the model weight is really initialized on CPU memory, but afterdeepspeed.initialzed()
, the model still move to GPU memory. So I am wondering 1) in Zero offload (stage 3, offload to cpu/nvme) is it possible the weight of the model stay mainly on CPU memory/Nvme but only is loaded layer by layer to GPU memory? 2) I found inengine.py
(actually it is called bydeepspeed.initialize()
) there is an argumentdont_change_device
(link)[https://github.com/microsoft/DeepSpeed/blob/4ae3a3da0dfd19d7ab7a76e7c742ac12f44fc1c0/deepspeed/runtime/engine.py#L1138-L1139] which controls whether or not the model weights is moved to GPU memory. But I also found no place to calldont_change_device
. So my question is how to usedont_change_device
and is it used to retain the model weight on CPU memory?