Closed ScottishFold007 closed 1 year ago
Are you by chance using euler_a
? I had this issue with Stable Diffusion v2.0 with just regular pipeline. I changed to DPM Solver like in their example, and it worked fine.
DDIMScheduler
Yes ,but I also use DDIMScheduler,the same result...
Are you by chance using
euler_a
? I had this issue with Stable Diffusion v2.0 with just regular pipeline. I changed to DPM Solver like in their example, and it worked fine.Euler_A
DPMSolver
maybe you can use the code I make and see the results,best wishes~
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
Describe the bug
clip guided generates a very poor result,
this is my code:
`logger = logging.get_logger(name) # pylint: disable=invalid-name
class CLIPGuidedStableDiffusionPipeline(DiffusionPipeline): r""" Pipeline for text-to-image generation using Stable Diffusion. This model inherits from [
DiffusionPipeline
]. Check the superclass documentation for the generic methods the library implements for all the pipelines (such as downloading or saving, running on a particular device, etc.) Args: vae ([AutoencoderKL
]): Variational Auto-Encoder (VAE) Model to encode and decode images to and from latent representations. text_encoder ([CLIPTextModel
]): Frozen text-encoder. Stable Diffusion uses the text portion of CLIP, specifically the clip-vit-large-patch14 variant. tokenizer (CLIPTokenizer
): Tokenizer of class CLIPTokenizer. unet ([UNet2DConditionModel
]): Conditional U-Net architecture to denoise the encoded image latents. scheduler ([SchedulerMixin
]): A scheduler to be used in combination withunet
to denoise the encoded image latents. Can be one of [DDIMScheduler
], [LMSDiscreteScheduler
], or [PNDMScheduler
]. safety_checker ([StableDiffusionSafetyChecker
]): Classification module that estimates whether generated images could be considered offensive or harmful. Please, refer to the model card for details. feature_extractor ([CLIPFeatureExtractor
]): Model that extracts features from generated images to be used as inputs for thesafety_checker
. """ _optional_components = ["safety_checker", "feature_extractor"]`import torch from diffusers import EulerDiscreteScheduler,DDIMScheduler from transformers import CLIPFeatureExtractor, CLIPModel
model_name= r"D:\stable_diffusion模型合辑\stabilityai_stable-diffusion-2" clip_model_path= r'D:\CLIP_models\clip-vit-large-patch14-336' model_name= r"D:\stable_diffusion模型合辑\stabilityai_stable-diffusion-2"
feature_extractor = CLIPFeatureExtractor.from_pretrained(clip_model_path) clip_model = CLIPModel.from_pretrained(clip_model_path, torch_dtype=torch.float16) clip_tokenizer= CLIPTokenizer.from_pretrained(clip_model_path)
Use the Euler scheduler here instead
scheduler = EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler", prediction_type="v_prediction")
Setup the scheduler and pipeline
scheduler = DDIMScheduler(beta_start=0.00085, beta_end=0.012, beta_schedule="scaled_linear", clip_sample=False, set_alpha_to_one=False, prediction_type="v_prediction") # <- make sure we are doing v_prediction
guided_pipeline = CLIPGuidedStableDiffusionPipeline.from_pretrained( pretrained_model_name_or_path=model_name, clip_model=clip_model, clip_tokenizer= clip_tokenizer, feature_extractor=feature_extractor, local_files_only=True, safety_checker=None, revision="fp16", torch_dtype=torch.float16, ) guided_pipeline.enable_attention_slicing(5) guided_pipeline = guided_pipeline.to("cuda") `
`#prompt = "fantasy book cover, full moon, fantasy forest landscape, golden vector elements, fantasy magic, dark light night, intricate, elegant, sharp focus, illustration, highly detailed, digital painting, concept art, matte, art by WLOP and Artgerm and Albert Bierstadt, masterpiece" negative_prompt= '''lowres, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, artist name, (((ugly))), (((duplicate))), ((morbid)), ((mutilated)), out of frame, extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((blurry)), ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))), bad proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), (fused fingers), (too many fingers)''' prompt= 'a big fat cat'
generator = torch.Generator(device="cuda").manual_seed(844440) images = [] for i in range(4): image = guided_pipeline( prompt, negative_prompt=negative_prompt, num_inference_steps=65, guidance_scale=7.5, clip_guidance_scale=100, num_cutouts=4, use_cutouts=True, generator=generator, ).images[0]
`
Reproduction
No response
Logs
No response
System Info
diffusers==0.9.0