Big Models, move model to CPU after dispatching to multiple devices

balaabhijit commented 3 weeks ago

System Info

- `Accelerate` version: 0.27.2
- Platform: Linux-5.15.0-76-generic-x86_64-with-glibc2.35
- Python version: 3.10.13
- Numpy version: 1.26.3
- PyTorch version (GPU?): 2.2.0+cu118 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 503.74 GB
- GPU type: NVIDIA A100 80GB PCIe
- `Accelerate` default config:
    Not found

Information

[ ] The official example scripts
[X] My own modified scripts

Tasks

[ ] One of the scripts in the examples/ folder of Accelerate or an officially supported no_trainer script in the examples folder of the transformers repo (such as run_no_trainer_glue.py)
[X] My own task or dataset (give details below)

Reproduction

This PR #1790 seems to add a warning when trying to move a model which is dispatched on multiple devices. Is there a way to move the model back to CPU once we dispatch to multiple devices? I even tried saving the model, deleting and reloading the model but, deleting the model in python with del model (Even with GC and cuda cache clear) doesn't seem to clear GPU memory. Is there anyway to achieve this?

Expected behavior

Move model to CPU after dispatching to multiple GPUs

SunMarc commented 3 weeks ago

Hi @balaabhijit, thanks for reporting ! What would be the use case for this ? Are you modifying the model ? To answer your question, you can use remove the hooks by calling remove_hook_from_module(model, recurse=True), then you should be able to move the model to your chosen device. As for the warning, I will create a PR, so that we remove it when calling remove_hook_from_module ! LMK if this works !

balaabhijit commented 3 weeks ago

@SunMarc Thanks for your prompt reply. This works!

The use case is a very specific case where we are trying to quantize the model as part of a pipeline. The previous step requires the model to be distributed and the quantization step (AWQ) happens layer by layer so as to save GPU memory. So I had to move the model back and forth from the device

SunMarc commented 3 weeks ago

Sounds good ! I'm closing this issue since this is solved !

huggingface / accelerate