add device map and accelerate to DiffusionPipeline abstraction to reduce memory footprint when loading model

piEsposito commented 1 year ago

Is your feature request related to a problem? Please describe. As a follow up for #281 we could add the device map and the possibility to load weights using accelerate to DiffusionPipeline abstraction for smaller memory footprint when loading models.

Describe the solution you'd like


from diffusers import DiffusionPipeline

pipeline = DiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-3", device_map="auto")

Describe alternatives you've considered

Load models by hand every time
Have a larger memory footprint when loading models

Additional context This is a follow up for #281.

I can work on that if you folks would let me.

piEsposito commented 1 year ago

@patrickvonplaten I've created the follow up for #281 and can work on that if you let me.

piEsposito commented 1 year ago

@patrickvonplaten, the PR is open and ready for review. Thanks!

CrazyBoyM commented 1 year ago

Hi, friend.I see this PR https://github.com/huggingface/diffusers/pull/361 but when i try this:

self.pipe = StableDiffusionPipeline.from_pretrained(
            self.model_id_or_path,
            revision="fp32", 
            device_map="auto",
            torch_dtype=torch.float32,
            scheduler = DDIMScheduler(
                beta_start=0.00085,
                beta_end=0.012,
                beta_schedule="scaled_linear",
                clip_sample=False,
                set_alpha_to_one=False,
            ),
            # use_auth_token=True,
        )
        self.pipe = self.pipe.to(self.device)

i get a error with:

   set_alpha_to_one=False,
  File "/root/.conda/envs/ai/lib/python3.7/site-packages/diffusers/pipeline_utils.py", line 517, in from_pretrained
    loaded_sub_model = load_method(os.path.join(cached_folder, name), **loading_kwargs)
  File "/root/.conda/envs/ai/lib/python3.7/site-packages/transformers/modeling_utils.py", line 2269, in from_pretrained
    max_memory=max_memory,
  File "/root/.conda/envs/ai/lib/python3.7/site-packages/accelerate/utils/modeling.py", line 480, in infer_auto_device_map
    max_layer_size, max_layer_names = get_max_layer_size(modules_to_treat, module_sizes, no_split_module_classes)
  File "/root/.conda/envs/ai/lib/python3.7/site-packages/accelerate/utils/modeling.py", line 261, in get_max_layer_size
    modules_children = list(module.named_children())
AttributeError: 'Parameter' object has no attribute 'named_children'

can you help me?

piEsposito commented 1 year ago

@CrazyBoyM I saw that the safety checker is the offensor here. I'm not employed by nor affiliated to HF, but as I'm the one who submitted the idea and the PR, I will try to fix it later today and have, at least, a branch were you can use it before it is merged.

How does that sound?

CrazyBoyM commented 1 year ago

@CrazyBoyM I saw that the safety checker is the offensor here. I'm not employed by nor affiliated to HF, but as I'm the one who submitted the idea and the PR, I will try to fix it later today and have, at least, a branch were you can use it before it is merged.

How does that sound?

Very Great !

piEsposito commented 1 year ago

Closed as per #772 and https://github.com/huggingface/accelerate/pull/747.

piEsposito commented 1 year ago

@CrazyBoyM, what happens is that, for that feature to work, we need a version of accelerate with https://github.com/huggingface/accelerate/pull/747 merged. This PR was merged 4 days ago and the last release was 7 days ago.

Until they make the next release for accelerate, if you really want to use this feature I suggest you install accelerate from the master branch of the repository: pip install git+https://github.com/huggingface/accelerate.git, which you can revert to the pypi version after the next release.

CrazyBoyM commented 1 year ago

@CrazyBoyM, what happens is that, for that feature to work, we need a version of accelerate with huggingface/accelerate#747 merged. This PR was merged 4 days ago and the last release was 7 days ago.

Until they make the next release for accelerate, if you really want to use this feature I suggest you install accelerate from the master branch of the repository: pip install git+https://github.com/huggingface/accelerate.git, which you can revert to the pypi version after the next release.

it works! very thanks. but sames no more VRAM was saves, may be it because I have use some other trick as these:

      self.pipe = StableDiffusionPipeline.from_pretrained(
            self.model_id_or_path,
            device_map="auto",
            # load_in_8bit=True,
            low_cpu_mem_usage=True,
            torch_dtype=torch.float16,
            scheduler = DDIMScheduler(
                beta_start=0.00085,
                beta_end=0.012,
                beta_schedule="scaled_linear",
                clip_sample=False,
                set_alpha_to_one=True,
            ),
            # use_auth_token=True,
        )

        print(self.pipe.unet.conv_out.state_dict()["weight"].stride())  # (2880, 9, 3, 1)
        self.pipe.unet.to(memory_format=torch.channels_last)  # in-place operation
        print(
            self.pipe.unet.conv_out.state_dict()["weight"].stride()
        )  # (2880, 1, 960, 320) having a stride of 1 for the 2nd dimension proves that it works
        self.pipe = self.pipe.to(self.device)
        self.pipe.enable_attention_slicing()

on the T4, it cost about 4.2Gb VRAM. I will test it on my 1660ti 6g.

hnnam0906 commented 2 months ago

Hi, is there any update on this issue? I'd like to use device_map="auto" for the text-to-video model "damo-vilab/text-to-video-ms-1.7b", but it seems it doesn't work.

DiffusionPipeline.from_pretrained("damo-vilab/text-to-video-ms-1.7b", torch_dtype=torch.float16, variant="fp16") https://huggingface.co/docs/diffusers/api/pipelines/text_to_video

huggingface / diffusers

add device map and accelerate to DiffusionPipeline abstraction to reduce memory footprint when loading model #725