huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
26.32k stars 5.42k forks source link

Is it possible diffusers implement an official support on the increasing or decreasing weight of prompt with () & []? #2431

Closed garyhxfang closed 1 year ago

garyhxfang commented 1 year ago

Is your feature request related to a problem? Please describe.

The currently the AUTOMATIC1111/stable-diffusion-web-ui support to increase or decrease the weight of an prompt with () & [] which is not supported by diffusers. (e.g. "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes ((happy)) hood japanese_clothes kimono (long_sleeves) red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms")

I request for this feature because I found that for many models on civitai, some negative_prompt with certain weight are very very important to generate a good result. For example (worst quality:2), (low quality:2). I tried for a long time and found it almost impossible to generate result with similar quality with the negative prompt without the increase or decrease of weight. ( it try duplicating "worst quality" for different number of times(2 times, 3times or 4 times) in my negative prompt, but they all generate result with much worse quality than (worst quality:2))

Describe alternatives you've considered When investing for the solution , I found a community pipeline Long Prompt Weighting Stable Diffusion which supports this feature. But after I try it, I found it quite unstable that it will often stuck for the long time when I use it for inference, which means it cannot be used in production environment So I think a better alternative is that we can directly support in in the official StableDiffusionPipeline

Describe the solution you'd like The example how I would like be like is describe below

import torch
from diffusers import StableDiffusionPipeline

pipe = StableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", weight_config=True)
pipe = pipe.to("cuda")

prompt = "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes ((happy)) hood japanese_clothes kimono (long_sleeves) red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms"
negative_prompt = "(worst quality:2), (low quality:2)"
image = pipe(prompt=prompt, negative_prompt=negative_prompt).images[0]

Do hope that @patrickvonplaten could have a check on this request, it will be very helpful for us developers to generate better images that have the same or even better quality than the ones user generate with AUTOMATIC1111/stable-diffusion-web-ui.

garyhxfang commented 1 year ago

@haofanwang master could also have a check if you are interested in this topic haha.

sayakpaul commented 1 year ago

Do you think it might be possible with SEGA?

https://huggingface.co/docs/diffusers/api/pipelines/semantic_stable_diffusion

garyhxfang commented 1 year ago

@sayakpaul Wow thanks a lot, let me have a try. Can it also support the img2img inference? I have checked the docs and found that it seems not having the API for img2img inference?

sayakpaul commented 1 year ago

You mean you wanted to use SEGA but conditioned on an input image?

garyhxfang commented 1 year ago

yes, i have two use cases need to implement, txt2img & img2img.

garyhxfang commented 1 year ago

for the img2img inference I'm currently using StableDiffusionImg2ImgPipeline

sayakpaul commented 1 year ago

Well, SEGA natively supports the first one i.e., text2image. For image2image, I believe you could:

As far as I know @manuelbrack might already have something regarding this. So, ccing them.

garyhxfang commented 1 year ago

Noted, thanks a lot! Let me have a try.

manuelbrack commented 1 year ago

Yes, there exists a preliminary version of SemanticStableDiffusionImg2ImgPipeline here: https://github.com/ml-research/diffusers/tree/sega_img2img

Just call the invert method on the pipeline and subsequent calls will always reconstruct the original image. You can then use this in combination with SEGA as outlined in the docu.

Skquark commented 1 year ago

That's already in the Long Prompt Weighting LPW community pipe, isn't it? That's the one I primarily use since it supports txt2img, img2img, and inpainting all in one. Is there any disadvantages to it verses the standard pipeline or the semantic proposal? I was kinda surprised lpw is still in the example community scripts rather than the diffusers collection since it seemed like the most practical one.

garyhxfang commented 1 year ago

@Skquark I have tried Long Prompt Weighting LPW community pipe, the result works well but it's too unstable to be used in live environment, it often stuck when I call the pipeline and i need to restart the process to run it again. Have you encountered the same problem?

Skquark commented 1 year ago

@garyhxfang It's been stable for me, I haven't noticed it getting stuck and I've been using it as my primary for months. I have made minor mods to it, but in general it's been solid and I've been searching for any downsides to it. I got it as the default pipeline in my https://stablediffusiondeluxe.com implementation, and WAS also uses it as primary in his Easy Diffusion. If anyone knows any disadvantages compared to the standard I'd like to know.

garyhxfang commented 1 year ago

Well, SEGA natively supports the first one i.e., text2image. For image2image, I believe you could:

  • first obtain an inverted noise using the newly introduced DDIMInverseScheduler. An end-to-end example on how to obtain such an inverted noise map is available here.
  • use the inverted noise to subsequently perform generation with SEGA.

As far as I know @manuelbrack might already have something regarding this. So, ccing them.

Hi, @sayakpaul , I tried SEGA yesterday, but it seems much slower that the StableDiffusionPipeline, it takes almost x3 processing time to generate a image with same size comparing with StableDiffusionPipeline, which make it not a good alternative to use in application.

But the Long Prompt Weighting LPW community pipe have similar speed with StableDiffusionPipeline, the only problems is that this community pipeline are quite unstable

And I also get very weird result with the weight config, and I have some question on the edit_weights:

import torch
from diffusers import SemanticStableDiffusionPipeline

pipe = SemanticStableDiffusionPipeline.from_pretrained("runwayml/stable-diffusion-v1-5", torch_dtype=torch.float16)
pipe = pipe.to("cuda")

out = pipe(
    prompt="a photo of the face of a woman",
    num_images_per_prompt=1,
    guidance_scale=7,
    editing_prompt=[
        "smiling, smile",  # Concepts to apply
        "glasses, wearing glasses",
        "curls, wavy hair, curly hair",
        "beard, full beard, mustache",
    ],
    reverse_editing_direction=[False, False, False, False],  # Direction of guidance i.e. increase all concepts
    edit_warmup_steps=[10, 10, 10, 10],  # Warmup period for each concept
    edit_guidance_scale=[4, 5, 5, 5.4],  # Guidance scale for each concept
    edit_threshold=[
        0.99,
        0.975,
        0.925,
        0.96,
    ],  # Threshold for each concept. Threshold equals the percentile of the latent space that will be discarded. I.e. threshold=0.99 uses 1% of the latent dimensions
    edit_momentum_scale=0.3,  # Momentum scale that will be added to the latent guidance
    edit_mom_beta=0.6,  # Momentum beta
    edit_weights=[1, 1, 1, 1, 1],  # Weights of the individual concepts against each other
)

In the example provided in the doc, the editing_prompt is an array with length == 4, but the edit_weights is with length == 5, why there is an extra element in the edit_weights? And also do the weight in the SEGA is something with similar affect on the image comparing to the weight in Long Prompt Weighting LPW community pipe and AUTOMACTIC1111/stable_diffusion_webui ? An example of weight in web_ui is describe below:

Cheat sheet:

a (word) - increase attention to word by a factor of 1.1
a ((word)) - increase attention to word by a factor of 1.21 (= 1.1 * 1.1)
a [word] - decrease attention to word by a factor of 1.1
a (word:1.5) - increase attention to word by a factor of 1.5
a (word:0.25) - decrease attention to word by a factor of 4 (= 1 / 0.25)
a \(word\) - use literal () characters in prompt
With (), a weight can be specified like this: (text:1.4). If the weight is not specified, it is assumed to be 1.1. Specifying weight only works with () not with [].

If you want to use any of the literal ()[] characters in the prompt, use the backslash to escape them: anime_\(character\).
garyhxfang commented 1 year ago

@garyhxfang It's been stable for me, I haven't noticed it getting stuck and I've been using it as my primary for months. I have made minor mods to it, but in general it's been solid and I've been searching for any downsides to it. I got it as the default pipeline in my https://stablediffusiondeluxe.com implementation, and WAS also uses it as primary in his Easy Diffusion. If anyone knows any disadvantages compared to the standard I'd like to know.

@sayakpaul I guess the instability when I run the community pipeline Long Prompt Weighting LPW is due to the bad network connection when requesting github from China. Because I am facing the same issue when I call the checkpoint merge pipeline. Is it true that when I call a community pipeline in this way(shown below), it will always make a request to github?(no matter whether I am calling the pipeline for the first time in a some process)

from diffusers import DiffusionPipeline
import torch

pipe = DiffusionPipeline.from_pretrained(
    local_model_path, custom_pipeline="lpw_stable_diffusion", torch_dtype=torch.float16
)
pipe = pipe.to("cuda")

prompt = "best_quality (1girl:1.3) bow bride brown_hair closed_mouth frilled_bow frilled_hair_tubes frills (full_body:1.3) fox_ear hair_bow hair_tubes happy hood japanese_clothes kimono long_sleeves red_bow smile solo tabi uchikake white_kimono wide_sleeves cherry_blossoms"
neg_prompt = "lowres, bad_anatomy, error_body, error_hair, error_arm, error_hands, bad_hands, error_fingers, bad_fingers, missing_fingers, error_legs, bad_legs, multiple_legs, missing_legs, error_lighting, error_shadow, error_reflection, text, error, extra_digit, fewer_digits, cropped, worst_quality, low_quality, normal_quality, jpeg_artifacts, signature, watermark, username, blurry"

pipe.text2img(prompt, negative_prompt=neg_prompt, width=512, height=512, max_embeddings_multiples=3).images[0]

If so, if I have any ways to call the community pipeline without requesting github?

Skquark commented 1 year ago

The way I did mine is to copy it as pipeline.py in my HuggingFace models, then while calling pretrained I set custom_pipeline="AlanB/lpw_stable_diffusion_mod" and it'll come from there instead. That might fix your issue, so long as HF works better than github in China..

garyhxfang commented 1 year ago

The way I did mine is to copy it as pipeline.py in my HuggingFace models, then while calling pretrained I set custom_pipeline="AlanB/lpw_stable_diffusion_mod" and it'll come from there instead. That might fix your issue, so long as HF works better than github in China..

Thanks a lot master! let me have a try.

sayakpaul commented 1 year ago

Is it true that when I call a community pipeline in this way(shown below), it will always make a request to github?(no matter whether I am calling the pipeline for the first time in a some process)

It should not as from what I see you're loading the pipeline from local files.

patrickvonplaten commented 1 year ago

There have been lots of issues about this now, so linking them here:

As said a couple of times before we don't want to add too many high level features to diffusers but at the same time we also don't want to block important use cases and given the popularity of this feature this is an important use case. But it's already supported in diffusers! I think what's missing is maybe simply some good documentation.

@damian0815 made a very nice library that works well with diffusers, called https://github.com/damian0815/compel with this library, it should be very easy to have prompt weighting for every diffusers pipeline that has the prompt_embeds input.

I'll open a PR to add a doc page about it since it seems to be such an important feature, but I'd really like to rely on compel here to stay true to the diffusers==toolbox philosophy.

patrickvonplaten commented 1 year ago

Opened two PRs to Compel to make them a bit more user-friendly for diffusers in general. If the author of compel is happy with those, we can advertise them better in diffusers and also run some general tests for this functionality: See:

One can already use the library though very nicely as follows:

from diffusers import StableDiffusionPipeline, DPMSolverSinglestepScheduler, DPMSolverMultistepScheduler, DEISMultistepScheduler, HeunDiscreteScheduler
from diffusers.schedulers.scheduling_unipc_multistep import UniPCMultistepScheduler
import time
import os
from huggingface_hub import HfApi
from compel import Compel
import torch
import sys
from pathlib import Path

path = sys.argv[1]

api = HfApi()
start_time = time.time()
#pipe = StableDiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16, device_map="auto")
#pipe.scheduler = HeunDiscreteScheduler.from_config(pipe.scheduler.config)
pipe = StableDiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16)
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)

compel = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder)

pipe = pipe.to("cuda")

prompt = "a highly realistic photo of green turtle"

prompts = ["a cat playing with a ball++ in the forest", "a cat playing with a ball in the forest", "a cat playing with a ball-- in the forest"]

prompt_embeds = torch.cat([compel.build_conditioning_tensor(prompt) for prompt in prompts])

generator = [torch.Generator(device="cuda").manual_seed(0) for _ in range(prompt_embeds.shape[0])]
images = pipe(prompt_embeds=prompt_embeds, generator=generator, num_inference_steps=15).images

Note the ++ and -- syntax . While the ( ) and [ ] syntax was there first I do agree that ++ and -- also just makes more sense. Would be trivial to swap between the two syntaxes though.

patrickvonplaten commented 1 year ago

PR with docs opened here: https://github.com/huggingface/diffusers/pull/2574

Ephil012 commented 1 year ago

@patrickvonplaten The concern I have is that by using a third party library like Compel it's sort of pushing the problem further down the line.

Right now, diffusers recommends using a community pipeline maintained by a third-party / the community. Now diffusers is switching it for a library maintained by another third-party. It's essentially the same approach, but with a library instead of pipeline. I'd argue this still poses an issue because of the instability of this approach. Let me elaborate below.

The problem the community pipeline faced is that it became unstable because it wasn't properly maintained / improved by the third-party (or community in this case). It's possible that Compel will run into this exact same problem. Right now, it seems Damian is maintaining the library mostly by themselves. If Damian gets busy or if they decide to move on then the library will go unsupported. I do know Damian is a contributor to InvokeAI and that Invoke recently added references to Damian's library in their code. So it seems unlikely that Invoke will just let the library die given their dependency. However, the risk is still there. Damian owns the repo for Compel and if Damian stops monitoring the repo, then the library dies and Invoke will have to move off of it (albeit maybe there's another owner of the repo that I missed). As such, there's a bit of an inherent risk here by relying on this library.

I think the solution to this problem is the following: Right now, we already have the long prompt weighting (LPW) community pipeline. But it's not maintained. So why not just transition it to become an official pipeline? It would still be a separate pipeline from the base one. So, it wouldn't be breaking with the philosophy of diffusers by making the base one more complicated. The only difference would be that LPW has more support. I know I've made this case before in the other thread. But, the only alternative would just be to hope that Compel stays well supported, which isn't a guaranteed.

Really the fundamental issue is here is lack of support and it won't be fixed until diffusers makes a support commitment to important features like this. It can make that commitment however it likes so that it doesn't break the philosophy of diffusers = toolkit. Either that or it needs to absolutely make sure that the third party tools it recommends are well supported. But regardless, there needs to be some sort of support.

patrickvonplaten commented 1 year ago

Hey @Ephil012,

Thanks for voicing your concerns here, I understand where you're coming from. Besides not being in line with our philosophy, the big problem here is maintainability. We don't have the time and people to maintain higher-level use cases. If we add prompt weighting as a core functionality, we open the door to add more and more UI/UX features.

Now since this is a highly requested feature, I think adding both:

We have a very high level of stability. If damian decides to stop maintaining the library, we still have a solid compel==0.18.0 version.

Just adding a new pipeline is not an option because then we're closing the door for all use cases of prompt weighting for other pipelines. Instead we now have a robust system of:

I don't see a problem here for the community at all tbh - as you can see in the doc here: https://github.com/huggingface/diffusers/blob/176d85cb55d6908c003dff12ef4e2d077aafd1c7/docs/source/en/using-diffusers/weighted_prompts.mdx it's now a three-liner of code to do prompt weighting with compel and diffusers

patrickvonplaten commented 1 year ago

Also note that compel is light-weight (not many dependencies) and with a very active author (both my PRs were merged within a day: https://github.com/damian0815/compel/pulls?q=is%3Apr+is%3Aclosed).

damian0815 commented 1 year ago

fwiw @Ephil012 Compel also supports long prompts as of v0.1.10 (released yesterday) which i'd expect makes the LPW pipeline pretty much redundant. as the maintainer of Compel i'm closely involved with the development of InvokeAI, which uses Compel for prompt weighting, so you've got the benefit of two professional business orgs backing Compel-driven code.

hipsterusername commented 1 year ago

I'll comment here echoing confidence in compel. @damian0815 has done an excellent job building a flexible and streamlined prompt syntax, and I've been able to watch it develop first-hand.

Invoke is building our platform to become a foundation for professional usage/development in the ecosystem, with a more sustainable codebase supported by commercial offerings - As @patrickvonplaten has noted, compel will have multiple orgs using it at this point, and I'm confident that its criticality in the ecosystem will keep it well maintained.

cmdr2 commented 1 year ago

FWIW, as of today we've started using compel (and diffusers) in the latest beta version of Easy Diffusion (cmdr2 UI) - https://github.com/cmdr2/stable-diffusion-ui . It's running well, and will move to the main branch eventually.

We do a decent number of installs per day, with a fairly active user and developer community ( https://discord.com/invite/u9yhsFmEkB ). And I have a high degree of confidence in damian and Invoke's team, having collaborated with them in the past.

Ephil012 commented 1 year ago

Ah okay, if a bunch of people are vouching for it then it eases my concerns. I was just a bit concerned about being dependent on a third party lib. But if a bunch of people are using it already then I think that helps ease a lot of the worries around it.

duongna21 commented 1 year ago

Hi @patrickvonplaten @damian0815. Thank you for your hard work to bring weighted prompt into diffusers!

[Update 7/4] I'm sorry for using the porny prompts below.

I tested compel with Realistic Vision 2.0 and got inconsistent results between diffusers and A1111 webui. Specifically, using the prompt below (from civitai) I got these oversaturated, bad outputs.

from compel import Compel
from diffusers import StableDiffusionPipeline, EulerAncestralDiscreteScheduler
import torch

path = "SG161222/Realistic_Vision_V2.0"
pipe = StableDiffusionPipeline.from_pretrained(path, torch_dtype=torch.float16, safety_checker=None).to("cuda")
pipe.scheduler = EulerAncestralDiscreteScheduler.from_config(pipe.scheduler.config)

compel = Compel(tokenizer=pipe.tokenizer, text_encoder=pipe.text_encoder, truncate_long_prompts=False)

prompt = "highly detailed RAW Instagram (elegant sitting full body pose)1.4 photo of beautiful mature 26 years old (French medieval period nobility)1.4 woman, (highly detailed very long beautiful wavy hair)1.3, inside monastery's corridor background, (look at viewer)1.4, (skin pores, skin imperfections)++, (beautiful moles, freckles)--, highly detailed body, highly detailed face, (realistic sun lighting)0.4, shadows, 8k high definition, insanely detailed, intricate, masterpiece, highest quality, (angular face, slightly masculine feature face)++"
negative_prompt = "(panties)++, (bra)++, (pierced belly button)++, (body piercing)++, (3d)1.6, (3d render)1.6, (3dcg)1.6, (cropped head)+, (deformed, deformed body, deformed glasses, deformed legs)1.3, bad nipples, ugly nipples, draft, drawing, duplicate, error, extra arms, extra breasts, extra calf, extra digit, extra ears, extra eyes, extra feet, extra heads, extra knee, extra legs, extra limb, extra limbs, extra shoes, extra thighs, extra limb, failure, fake, fake face, fewer digits, floating limbs, grainy, gross, gross proportions, short arm, head out of frame, illustration, image corruption, irregular, jpeg artifacts, long body, long face, long neck, long teeth, long feet, lopsided, low, low quality, low res, low resolution, low res, lowres, malformed, messy drawing, misshapen, monochrome, more than 1 left hand, more than 1 right hand, more than 2 legs, more than 2 nipples, more than 2 thighs, more than two shoes, mosaic, multiple, multiple breasts, mutated, mutation, mutilated, no color, normal quality, (out of focus)++, (out of frame)++, oversaturated, surreal, twisted, , unappealing, uncoordinated body, uneven, unnatural, unnatural body, unprofessional, weird colors, worst, worst quality, (penis, dick, penetration)1.3, (fake skin, porcelain skin)1.3, (bad feet, wrong feet)1.3, (bad hands, wrong hands)1.3, (deformed iris, deformed pupils, semi-realistic, CGI, 3d, render, sketch, cartoon, drawing, blur, anime)1.6, (:blurry background)1.6"

num_images = 4
generator = [torch.Generator(device="cuda").manual_seed(i) for i in range(num_images)]
images = pipe(prompt_embeds=prompt_embeds, negative_prompt_embeds=negative_prompt_embeds, generator=generator, num_images_per_prompt=4, num_inference_steps=20).images

Meanwhile, I got expected results with the A1111 webui using the converted equivalent prompts.

prompt = "highly detailed RAW Instagram (elegant sitting full body pose:1.4) photo of beautiful mature 26 years old (French medieval period nobility:1.4) woman, (highly detailed very long beautiful wavy hair:1.3), inside monastery's corridor background, (look at viewer:1.4), (skin pores, skin imperfections:1.2), (beautiful moles, freckles:0.8), highly detailed body, highly detailed face, (realistic sun lighting:0.4), shadows, 8k high definition, insanely detailed, intricate, masterpiece, highest quality, (angular face, slightly masculine feature face:1.2)"
negative_prompt = "(panties:1.2), (bra:1.2), (pierced belly button:1.2), (body piercing:1.2), (3d:1.6), (3d render:1.6), (3dcg:1.6), (cropped head), (deformed, deformed body, deformed glasses, deformed legs:1.3), bad nipples, ugly nipples, draft, drawing, duplicate, error, extra arms, extra breasts, extra calf, extra digit, extra ears, extra eyes, extra feet, extra heads, extra knee, extra legs, extra limb, extra limbs, extra shoes, extra thighs, extra limb, failure, fake, fake face, fewer digits, floating limbs, grainy, gross, gross proportions, short arm, head out of frame, illustration, image corruption, irregular, jpeg artifacts, long body, long face, long neck, long teeth, long feet, lopsided, low, low quality, low res, low resolution, low res, lowres, malformed, messy drawing, misshapen, monochrome, more than 1 left hand, more than 1 right hand, more than 2 legs, more than 2 nipples, more than 2 thighs, more than two shoes, mosaic, multiple, multiple breasts, mutated, mutation, mutilated, no color, normal quality, (out of focus:1.2), (out of frame:1.2), oversaturated, surreal, twisted, , unappealing, uncoordinated body, uneven, unnatural, unnatural body, unprofessional, weird colors, worst, worst quality, (penis, dick, penetration:1.3), (fake skin, porcelain skin:1.3), (bad feet, wrong feet:1.3), (bad hands, wrong hands:1.3), (deformed iris, deformed pupils, semi-realistic, CGI, 3d, render, sketch, cartoon, drawing, blur, anime:1.6), (:blurry background:1.6)"

When I removed the weights (parentheses and coefficients) from the prompts, diffusers and A1111 webui gave similarly good results.

Do you have any idea on the difference between diffusers and A1111 in case of weighted prompts?

[Update April 6]: I tested with runwayml/stable-diffusion-v1-5 and got similar inconsistency.

cmdr2 commented 1 year ago

@duongna21 Maybe better to open an issue on compel's repository? https://github.com/damian0815/compel/issues

Thanks

patrickvonplaten commented 1 year ago

@duongna21 @damian0815 I'm not 100% sure that the ( ) should be used for compel?

E.g. should there be all the paratheses such as ( and ) here:

prompt = "highly detailed RAW Instagram (elegant sitting full body pose)1.4 photo of beautiful mature 26 years old (French medieval period nobility)1.4 woman, (highly detailed very long beautiful wavy hair)1.3, inside monastery's corridor background, (look at viewer)1.4, (skin pores, skin imperfections)++, (beautiful moles, freckles)--, highly detailed body, highly detailed face, (realistic sun lighting)0.4, shadows, 8k high definition, insanely detailed, intricate, masterpiece, highest quality, (angular face, slightly masculine feature face)++"

@damian0815 any ideas?

patrickvonplaten commented 1 year ago

Very curious to find out what could be the reason here. Also cc @apolinario

hipsterusername commented 1 year ago

Parens are fine w/ Compel - They simply group terms for weighting - You can (do this)++ and the result if effectively do++ this++. Or (this+ (and this))++ which would be this+++ and++ this++.

A + is upweighted by a factor of 1.1, so ++ is (1.1)(1.1) - not exactly the same as 1.2. Although I'm not entirely sure that it would be so different as to be "inconsistent".

damian0815 commented 1 year ago

ahh i missed this detail when responding earlier

When I removed the weights (parentheses and coefficients) from the prompts, diffusers and A1111 webui gave similarly good results.

yeah this is expected. the weighting in compel and a111 works differently - among other things auto111 does a lerp against 0,0,0,... for each of the terms and then normalizes, whereas compel lerps against an embedding produced from the empty string ("") and does not normalize. i.e. - compel preserves the positional encoding, auto111 does not.

the difference in normalizing behaviour you can probably compensate by lowering the CFG

damian0815 commented 1 year ago

also - you may be used to the Karras scheduling in auto111 which makes a big difference to output quality if step counts are low. @patrickvonplaten afaik the diffusers StableDiffusionPipeline schedulers don't support Karras timestep scheduling, is that correct?

duongna21 commented 1 year ago

More context: I used Euler-a with 20 steps (same as diffusers setting) to produce these outputs.

damian0815 commented 1 year ago

@duongna21 can you try Euler-a with 50 steps, and also DDIM, and compare with diffusers? please make sure you have karras scheduling disabled (i don't know how auto111 works, i assume you can do this)

patrickvonplaten commented 1 year ago

We've very recently also added support for Karras sigmas here: https://github.com/huggingface/diffusers/blob/8c6b47cfdea1962e23d3407f034b3b00dda8f2d6/src/diffusers/schedulers/scheduling_euler_discrete.py#L125

patrickvonplaten commented 1 year ago

Will try to debug this this week

damian0815 commented 1 year ago

We've very recently also added support for Karras sigmas here:

https://github.com/huggingface/diffusers/blob/8c6b47cfdea1962e23d3407f034b3b00dda8f2d6/src/diffusers/schedulers/scheduling_euler_discrete.py#L125

ahhha brilliant

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

hipsterusername commented 1 year ago

As a note on compel, Vlad's fork of Auto1111 has adopted compel as an option.

What are open items in consideration here?

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

patrickvonplaten commented 1 year ago

Still need to test some things here

krNeko9t commented 1 year ago

My opinion is that the community of A1111-like ai generation community has grown to the point that cannot be ignored, so as the user habbit. We can't build an application that tells the user: "sorry, you need to learn a new prompting syntax to use our service." And i find it annoying when playing with my own diffuser scripts every time i copy some fantastic showcase parameter online, i had to convert it to my version, and even eventually i can not reimplement the result. Often it's because of inconsistency in prompting. So i wish the necessity and the best practice of prompting like "(1girl:1.3)" can be carefully considered. Thanks for you guys' contribution any way.

patrickvonplaten commented 1 year ago

Hmm, yeah we really quite heavily want to rely on Compel here... @damian0815 do you think it could make sense to parse A1111-like syntax to Compel syntax to make it easier for the community? It shouldn't be too difficult I guess no?

patrickvonplaten commented 1 year ago

Related: https://github.com/huggingface/diffusers/issues/3980

hipsterusername commented 1 year ago

There are prompt converter utilities out there, but (personally) I think attempting to maintain a A1111 converter is outside the scope of what Compel is intended to do. There are different functions, implementation perspectives, etc. - Not everything Compel does can be mapped to A1111 syntax, and vice versa.

I'm not sure the notion of "I need to be able to replicate everything across apps 100%" is realistic or desirable.

krNeko9t commented 1 year ago

There are prompt converter utilities out there, but (personally) I think attempting to maintain a A1111 converter is outside the scope of what Compel is intended to do. There are different functions, implementation perspectives, etc. - Not everything Compel does can be mapped to A1111 syntax, and vice versa.

I'm not sure the notion of "I need to be able to replicate everything across apps 100%" is realistic or desirable.

It's more about user habbit, cause there are million's users(who are not good developer) out there which are quite familiar with A1111 webui and it's prompting way. And since there is neither official nor thirdparty support for prompting like that, these user have no choice but bear with A1111 and left diffuser i think.

damian0815 commented 1 year ago

i do not have the bandwidth to maintain a convertor from auto to compel syntax, but if someone wants to contribute one and offer to maintain it i'd have no objection to a converter being a pre-processing step built into the compel library.

note however @krNeko9t it would be misleading because using diffusers+compel with a cat playing with a (ball)1.1 in the forest will always produce a different image to running auto with a cat playing with a (ball:1.1) in the forest which would mean an increase of the number of users complaining that the prompt they copy/pasted doesn't produce the same image.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.