[Community] Add pipeline for CLIPSeg x Stable Diffusion

NielsRogge commented 1 year ago

Model/Pipeline/Scheduler description

We've just added CLIPSeg to the 🤗 Transformers library, making it possible to use CLIPSeg in a few lines of code as shown in this notebook. The model is a minimal extension of CLIP for zero-shot and one-shot image segmentation.

It'd be great to create a new pipeline that leverages it for text-based (prompt) image inpainting. This way, people can just type whatever they want to inpaint in an image with a model like Stable Diffusion.

The idea of leveraging CLIPSeg was proposed here: https://github.com/amrrs/stable-diffusion-prompt-inpainting.

Open source status

[X] The model implementation is available
[X] The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

CLIPSeg is available here: https://huggingface.co/docs/transformers/main/en/model_doc/clipseg.

dblunk88 commented 1 year ago

nice! This could be combined with inpainting to potentially (more accurately) replace existing things in an image.

WASasquatch commented 1 year ago

I still love the idea of this, but the detection and mask creation is still an inherent problem. Sure it looks nice in your example against a white background, but anything else, and the masking is clearly visible in the diffusions in painting in anomalies with it's boundary. This needs some sort of algorithm to create curved points between edges.

nokunato commented 1 year ago

Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help

device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)

unography commented 1 year ago

Hi @patrickvonplaten , I added a PR for this here - https://github.com/huggingface/diffusers/pull/1250

WASasquatch commented 1 year ago

I wonder if CLIPSeg can be improved with Guassian Blur and SVG Tracing, producing a two tone mask that has curved tracing based on Gaussian smoothing. Has this been suggested to CLIPSeg? I really want something like this in diffusers, but also want it to be complementary with art and fidelity.

tenghui98 commented 1 year ago

Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help

device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)

Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help

device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)

How fix this?

nokunato commented 1 year ago

Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)

Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)

How fix this?

Not sure, tried but getting this new error after trying to download the inpaint model

new error: "ValueError: The component <class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'> of <class 'diffusers_modules.git.text_inpainting.TextInpainting'> cannot be loaded as it does not seem to have any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin': ['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel': ['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}."

misbahsy commented 1 year ago

Getting the same error as commented by @nokunato . Any fix @NielsRogge ?

NielsRogge commented 1 year ago

Have you installed Transformers from source?

misbahsy commented 1 year ago

Have you installed Transformers from source?

yes, I did. Tried pip install, as well as, git install - git+https://github.com/huggingface/transformers.git

Let me know if I missed something. Thanks.

NielsRogge commented 1 year ago

For me it runs fine, see also the code in app.py of this Space: https://huggingface.co/spaces/nielsr/text-based-inpainting

misbahsy commented 1 year ago

Thanks for the link! Got it working by following the spaces implementation.

WASasquatch commented 1 year ago

This pipeline needs a lot of work.

1) It should not change a input image filling it in with black to make it 1:1. If it needs to do that to obtain the masks, that's fine, but should be returned to input aspect ratio and size (use padding as a area to crop from). 2) The entire image is being altered by diffusion, not just the mask areas, so faces, hands, scenery gets messed up and looks bad. 3) Masks are interpreted with blocks which do not always align with the subject, or interfere with the background. 4) Image size is reduced (even with padding) showing a severe loss of quality relating to point number 2 (such as highly textured clothes being smoothed and muddled out)

Because of these glaring issues, the mask should at least be provided to the end-use for post, to composite back in the original HD image around the new addition.

Example of an entire image trashed by this pipeline (have more if needed). More than just a gimmick, this method just ruins images.

original_diffusion

CLIPSeg_Image

a-torrano-m commented 1 year ago

Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)

Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)

How fix this?

Not sure, tried but getting this new error after trying to download the inpaint model

new error: "ValueError: The component <class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'> of <class 'diffusers_modules.git.text_inpainting.TextInpainting'> cannot be loaded as it does not seem to have any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin': ['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel': ['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}."

Hi, I hope not to be redundant. But I was coping with some similar issue while following video generation explained in the link : https://pypi.org/project/stable-diffusion-videos/

I cannot apply the "spaces" solution NielsRogge suggests as I the diffusers exception happens inside the library stable_diffusion_videos.

The message I received is : File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/diffusers/pipeline_utils.py:516, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs) 512 if none_module.startswith(DUMMY_MODULES_FOLDER) and "dummy" in none_module: 513 # call class_obj for nice error message of missing requirements 514 class_obj() --> 516 raise ValueError( 517 f"The component {class_obj} of {pipeline_class} cannot be loaded as it does not seem to have" 518 f" any of the loading methods defined in {ALL_IMPORTABLE_CLASSES}." 519 ) 521 load_method = getattr(class_obj, load_method_name) 522 loading_kwargs = {}

ValueError: The component <class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'> of <class 'stable_diffusion_videos.stable_diffusion_pipeline.StableDiffusionWalkPipeline'> cannot be loaded as it does not seem to have any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin': ['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel': ['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}.

It is in the call to StableDiffusionWalkPipeline.from_pretrained where the exception is raised:

from stable_diffusion_videos import StableDiffusionWalkPipeline import torch torch.cuda.empty_cache()

pipeline = StableDiffusionWalkPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16, revision="fp16", ).to("cuda")

Has anybody found some new way around?

thank you very much!

Alexis

huggingface / diffusers