huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.1k stars 5.38k forks source link

[Design] better design to switch pipelines for already loaded pipeline #6531

Closed sayakpaul closed 6 months ago

sayakpaul commented 10 months ago

having separate pipelines in diffusers for features is somewhat cumbersome, especially since pipeline inheritance is less than ideal (e.g. why doesn't StableDiffusionImage2ImagePipeline inherit from StableDiffusionPipeline so I cannot check current model type easily?) and AutoPipeline is does not have full coverage and .from_pipeline even less.

IMO, we need a cleaner way to switch pipelines for already loaded pipeline - right now I'm instantiating it manually using loaded pipeline components, but it does cause issues with model offloading and things like that).

Especially using community pipelines - I cannot load from scratch just to run one generate. I want to switch to it when I want to use specific feature and then switch back.

Originally posted by @vladmandic in https://github.com/huggingface/diffusers/issues/6318#issuecomment-1885425063

spezialspezial commented 10 months ago

Would be nice indeed. I would assume in the end most people are looking for or building something like a fully featured MegaPipeline with LoRAs, Prompt-Emphasis, Textual-Inversion, ControlNet, Blackjack and IP-Adapter.

vladmandic commented 10 months ago

yup. many pipelines can be moved to be methods instead - for example, how come enable_freeu is a method that can be called on a pipeline and not a pipeline of its own? i'd say primary candidates that don't need to be pipelines are the likes of StableDiffusionSAGPipeline - its a normal pipeline that exposes one more tunable item.

patrickvonplaten commented 9 months ago

What is the problem with just manually switching pipelines as follows:

from diffusers import StableDiffusionPipeline, StableDiffusionImg2ImgPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5")
pipe_img2img = StableDiffusionImg2ImgPipeline(**pipe.components)

?

I'm especially curious about why this would cause problems with cpu offload

vladmandic commented 9 months ago

a) different piplines may have same core components, but different per-pipeline components - and param validation in pipeline contructor throws error - so each pipeline has to be manually constructed.

b) just an example: load pipeline, enable model offload, set ipadapter. now switch pipeline to img2img and back to txt2img - chances are you end up with tensor location mismatch in unrelated part of the code - and most commonly in text_encoder (why text_encoder? because i'm builinding embeds from prompt, not passing prompt as-is to pipeline)

 lib/python3.11/site-packages/torch/nn/functional.py:2264 in embedding
│ ❱ 2264 │   return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when 
checking argument for argument index in method wrapper_CUDA__index_select)
sayakpaul commented 9 months ago

Cc: @yiyixuxu

vladmandic commented 9 months ago

fyi, this is my attempt of a working switch_pipe method: https://github.com/vladmandic/automatic/blob/f04fa75eaee262011817bab4a6e4ffb052c7e22c/modules/sd_models.py#L976

core is this part:

        signature = inspect.signature(cls.__init__, follow_wrapped=True, eval_str=True)
        possible = signature.parameters.keys()
        if isinstance(pipeline, cls):
            return pipeline
        pipe_dict = {}
        components_used = []
        components_skipped = []
        switch_mode = 'none'
        if hasattr(pipeline, '_internal_dict'):
            for item in pipeline._internal_dict.keys(): # pylint: disable=protected-access
                if item in possible:
                    pipe_dict[item] = getattr(pipeline, item, None)
                    components_used.append(item)
                else:
                    components_skipped.append(item)
            new_pipe = cls(**pipe_dict)
menahem121 commented 9 months ago

yes im agree with that, for example i want on my dedicated server to alway let the pipeline loaded but if I'm working with different models its not currently possible without having to load few pipelines in parallel and that is consuming a lot of resources.

yiyixuxu commented 9 months ago

@vladmandic

I found a bug in the from_pipe - once that's fixed (in this PR https://github.com/huggingface/diffusers/pull/6820) I'm able to reproduce the use case you described using from_pipe with below script. Did I miss anything? Would you be able to provide a script that you would like for it to work but currently fails? I want to understand your use case fully so we can start improve from there :)

from diffusers import AutoPipelineForText2Image, AutoPipelineForImage2Image
import torch
from diffusers.utils import load_image

pipeline = AutoPipelineForText2Image.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16).to("cuda")
pipeline.load_ip_adapter("h94/IP-Adapter", subfolder="models", weight_name="ip-adapter_sd15.bin")
pipeline.set_ip_adapter_scale(0.6)

pipeline.enable_model_cpu_offload()

image = load_image("https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/load_neg_embed.png")
generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality, wearing sunglasses',
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50,
    num_images_per_prompt=1,
    generator=generator,
).images

images[0].save(f"out_1.png")

pipeline2 = AutoPipelineForImage2Image.from_pipe(pipeline)

pipeline = AutoPipelineForText2Image.from_pipe(pipeline2)
generator = torch.Generator(device="cpu").manual_seed(33)
images = pipeline(
    prompt='best quality, high quality, wearing sunglasses',
    ip_adapter_image=image,
    negative_prompt="monochrome, lowres, bad anatomy, worst quality, low quality", 
    num_inference_steps=50,
    num_images_per_prompt=1,
    generator=generator,
).images

images[0].save("out_2.png")
out1 out2
yiyi_test_1_out_1 yiyi_test_1_out_2
vladmandic commented 9 months ago

use case is that i load sd model and want to use different functionality conditionally. examples:

and same applies to 10+ other functions that require 10+ different pipelines - and some of them are built-in pipelines and some are community pipelines.

I never want to load a model using a specific pipeline, in 90%+ cases, i load model using StableDiffusionPipeline.from_single_file and want to reuse it only adding components as needed.

yiyixuxu commented 9 months ago

Thanks, @vladmandic

I think a lot of these use cases can be done using from_pipe method of AutoPipeline, no? You can switch any SD pipeline to its Controlnet pipeline, with or without IP-adapter loaded, and vice versa .... However, we do not support switching between different pipelines, for example, we cannot switch between StableDIffusionSAGPipeline and StableDiffusionPipeline like you mentioned here.

We can probably support this by adding a from_pipe method to DiffusionPipeline. However, I would like to know more examples of different pipelines that you switch between- I imagine most of them are not compatible, no? e.g. You cannot create a Kandinsky pipeline from SD pipeline etc

vladmandic commented 9 months ago

use case is to reuse existing model components whenever possible. yes, that can only ever work if target pipeline is explicitly compatible with those components (so scenario such as SD15->Kandinsky should never work). but there are many pipelines that are based on SD15 or SDXL. you've mentioned that currently we cannot switch from StableDiffusionPipeline to StableDIffusionSAGPipeline and that is just one example. how about SD to PIAPipeline? AnimateDiffPipeline, etc.

yiyixuxu commented 9 months ago

@vladmandic ahh, thanks! I think this makes a lot of sense to me:) I like the idea of extending the from_pipe functionality to any compatible pipeline (i.e., different pipelines that share the same checkpoint and model components). I agree that we should make it super easy to switch and advocate creating pipelines this way.

also, I think with this, we should also be able to create separate pipelines for free_init (see https://github.com/huggingface/diffusers/pull/6644). We can use free_init with a simple API like this I really don't think it would hurt usage at all

                        pipe = AutoFreeInitPipeline.from_pipe(pipe_animatediff)
yiyixuxu commented 9 months ago

@vladmandic

would you be able to provide an example (with code)? 🥺 that will really help me understand the problem better. I tried and I can't get the same error

: load pipeline, enable model offload, set ipadapter. now switch pipeline to img2img and back to txt2img - chances are you end up with tensor location mismatch in unrelated part of the code - and most commonly in text_encoder (why text_encoder? because i'm builinding embeds from prompt, not passing prompt as-is to pipeline)

alexblattner commented 9 months ago

I designed a structure that would remove the need for that, make everything more flexible and completely reproducible. The way it works is that it uses the basic t2i pipeline divides it into a series of functions that go off in a predetermined sequence. To add the slight changes necessary to t2i in order to make i2i, you just add a new function and choose where it'll go in the sequence. Again, this removes the need to recreate an entirely new pipeline for a small change as it can easily be plugged in.

https://github.com/alexblattner/RubberDiffusers

patrickvonplaten commented 9 months ago

@yiyixuxu I think using the auto classes for this is the right approach

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul commented 8 months ago

I think not stale?

yiyixuxu commented 8 months ago

yeah not stable i'm going to add a from_pipe on DiffusionPipeline

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

yiyixuxu commented 6 months ago

completed with #7241

alexblattner commented 6 months ago

@yiyixuxu this is nice and all, but something that would be really nice is to be able to modify pipelines easily without creating new ones. For example: the difference between the regular t2i pipeline and i2i pipelines are a few lines of code injected. Same thing for controlnet and i2i controlnet. There are many functionalities that can be injected or not like inpainting, pix2pix and many more. Why not make the pipelines easily inject-able without having to recreate a new one entirely?