Closed NielsRogge closed 1 year ago
nice! This could be combined with inpainting to potentially (more accurately) replace existing things in an image.
I still love the idea of this, but the detection and mask creation is still an inherent problem. Sure it looks nice in your example against a white background, but anything else, and the masking is clearly visible in the diffusions in painting in anomalies with it's boundary. This needs some sort of algorithm to create curved points between edges.
Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help
device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)
Hi @patrickvonplaten , I added a PR for this here - https://github.com/huggingface/diffusers/pull/1250
I wonder if CLIPSeg can be improved with Guassian Blur and SVG Tracing, producing a two tone mask that has curved tracing based on Gaussian smoothing. Has this been suggested to CLIPSeg? I really want something like this in diffusers, but also want it to be complementary with art and fidelity.
Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help
device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)
Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help
device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)
How fix this?
Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)
Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)
How fix this?
Not sure, tried but getting this new error after trying to download the inpaint model
new error: "ValueError: The component <class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'> of <class 'diffusers_modules.git.text_inpainting.TextInpainting'> cannot be loaded as it does not seem to have any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin': ['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel': ['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}."
Getting the same error as commented by @nokunato . Any fix @NielsRogge ?
Have you installed Transformers from source?
Have you installed Transformers from source?
yes, I did. Tried pip install, as well as, git install - git+https://github.com/huggingface/transformers.git
Let me know if I missed something. Thanks.
For me it runs fine, see also the code in app.py of this Space: https://huggingface.co/spaces/nielsr/text-based-inpainting
Thanks for the link! Got it working by following the spaces implementation.
This pipeline needs a lot of work.
1) It should not change a input image filling it in with black to make it 1:1. If it needs to do that to obtain the masks, that's fine, but should be returned to input aspect ratio and size (use padding as a area to crop from). 2) The entire image is being altered by diffusion, not just the mask areas, so faces, hands, scenery gets messed up and looks bad. 3) Masks are interpreted with blocks which do not always align with the subject, or interfere with the background. 4) Image size is reduced (even with padding) showing a severe loss of quality relating to point number 2 (such as highly textured clothes being smoothed and muddled out)
Because of these glaring issues, the mask should at least be provided to the end-use for post, to composite back in the original HD image around the new addition.
Example of an entire image trashed by this pipeline (have more if needed). More than just a gimmick, this method just ruins images.
Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)
Hi novice here, but can't seem to use diffusers when making use of this CLIPseg. Is there anyway around this. The command down below doesn't seem to run when using "from transformers import CLIPSegProcessor, CLIPSegForImageSegmentation". Help device = "cuda" pipe = StableDiffusionInpaintPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", revision="fp16", torch_dtype=torch.float16, use_auth_token=True ).to(device)
How fix this?
Not sure, tried but getting this new error after trying to download the inpaint model
new error: "ValueError: The component <class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'> of <class 'diffusers_modules.git.text_inpainting.TextInpainting'> cannot be loaded as it does not seem to have any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin': ['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel': ['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}."
Hi, I hope not to be redundant. But I was coping with some similar issue while following video generation explained in the link : https://pypi.org/project/stable-diffusion-videos/
I cannot apply the "spaces" solution NielsRogge suggests as I the diffusers exception happens inside the library stable_diffusion_videos.
The message I received is : File /anaconda/envs/azureml_py38/lib/python3.8/site-packages/diffusers/pipeline_utils.py:516, in DiffusionPipeline.from_pretrained(cls, pretrained_model_name_or_path, **kwargs) 512 if none_module.startswith(DUMMY_MODULES_FOLDER) and "dummy" in none_module: 513 # call class_obj for nice error message of missing requirements 514 class_obj() --> 516 raise ValueError( 517 f"The component {class_obj} of {pipeline_class} cannot be loaded as it does not seem to have" 518 f" any of the loading methods defined in {ALL_IMPORTABLE_CLASSES}." 519 ) 521 load_method = getattr(class_obj, load_method_name) 522 loading_kwargs = {}
ValueError: The component <class 'transformers.models.clip.image_processing_clip.CLIPImageProcessor'> of <class 'stable_diffusion_videos.stable_diffusion_pipeline.StableDiffusionWalkPipeline'> cannot be loaded as it does not seem to have any of the loading methods defined in {'ModelMixin': ['save_pretrained', 'from_pretrained'], 'SchedulerMixin': ['save_config', 'from_config'], 'DiffusionPipeline': ['save_pretrained', 'from_pretrained'], 'OnnxRuntimeModel': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizer': ['save_pretrained', 'from_pretrained'], 'PreTrainedTokenizerFast': ['save_pretrained', 'from_pretrained'], 'PreTrainedModel': ['save_pretrained', 'from_pretrained'], 'FeatureExtractionMixin': ['save_pretrained', 'from_pretrained']}.
It is in the call to StableDiffusionWalkPipeline.from_pretrained where the exception is raised:
from stable_diffusion_videos import StableDiffusionWalkPipeline import torch torch.cuda.empty_cache()
pipeline = StableDiffusionWalkPipeline.from_pretrained( "CompVis/stable-diffusion-v1-4", torch_dtype=torch.float16, revision="fp16", ).to("cuda")
Has anybody found some new way around?
thank you very much!
Alexis
Model/Pipeline/Scheduler description
We've just added CLIPSeg to the 🤗 Transformers library, making it possible to use CLIPSeg in a few lines of code as shown in this notebook. The model is a minimal extension of CLIP for zero-shot and one-shot image segmentation.
It'd be great to create a new pipeline that leverages it for text-based (prompt) image inpainting. This way, people can just type whatever they want to inpaint in an image with a model like Stable Diffusion.
The idea of leveraging CLIPSeg was proposed here: https://github.com/amrrs/stable-diffusion-prompt-inpainting.
Open source status
Provide useful links for the implementation
CLIPSeg is available here: https://huggingface.co/docs/transformers/main/en/model_doc/clipseg.