ljleb / prompt-fusion-extension

auto1111 webui extension for all sorts of prompt interpolations!
MIT License
259 stars 16 forks source link

cannot reproduce example results #37

Closed vladmandic closed 1 year ago

vladmandic commented 1 year ago

i've tried the extension and while its clearly enabled and active, i cannot reproduce any of the test results. what are the correct settings?

for example, [lion:bird:girl: , 6, 9] with Euler a and 20 steps would result in 90% of bird photos with some transforms in later steps towards lion and no influence of girl

i've tried experimenting with and i cannot transforms work a) in opposite direction (so from second term towards first), b) i cannot get more than 2 terms to do anything.

automatic webui version 02/01 commit hash 226d840e84c5f306350b0681945989b86760e616

ljleb commented 1 year ago

The examples were compiled using 30 steps. Can you try setting your sampler step number to 30? alternatively, I think something like [lion:bird:girl: , .24, .34] should work independently of step number, but give better results with 30+ steps.

It will not always give good results afaik but you should be able to find something like the examples with these settings. Let me know if it fixes one thing or two, or if it still doesn't work in the same way.

ljleb commented 1 year ago

I want to add, that it is more likely to resemble the readme examples and be skewed towards girl if you put all interpolation step numbers between step -1 and step .5 or earlier.

vladmandic commented 1 year ago

i'm not sure i understand, i've tried with [lion:bird:girl: , 6, 9] which is exactly from README and still getting mostly bird with some transforms towards lion near the end and no girl at all.

ljleb commented 1 year ago

Have you set number of steps to 30 for this example? I suggested using decimal numbers as they scale with the total number of steps used, but fixing the step numbers to 30 for the first example prompt should bring somewhat similar results.

Note that the exact results you will get will depend on the seed and other settings. I can share the exact settings I used for the dragon as I generated the second example, but we will need @John-WL to help us for the settings of the first example.

PladsElsker commented 1 year ago

I'd assume the model can have a large impact on the generation process. When I made the image, I was mostly fidling with the prompt. I later found out that the prompt itself was giving mostly images of birds as well, but this specific seed gave a cool result. You might wanna play a lot with the parameters to get more consistent results. I can post the girl image with metadata here later so you can have a better look at it.

PladsElsker commented 1 year ago

I think it might also help to understand what's going on if you could share a gif of the whole generation. You can make gifs of all the steps with this extension, for example: https://github.com/AlUlkesh/sd_save_intermediate_images

PladsElsker commented 1 year ago

Prompt:

[lion:bird:girl:, 7, 10]
Negative prompt: lowres,
(ugly:1.1), monster, humpbacked, (mutation:1.2), (bad proportions:1.2), (bad anatomy:1.2), long body, long neck,
(double head:1.1), (extra head:1.1), (poorly drawn face:1.1), (deformed face:1.2), extra face,
(extra limbs:1.2),
(mutated hands and fingers:1.3), (bad hands:1.3), missing finger,
(extra feet:1.2),
extra digits, fewer digits,
text, logo,
cropped, blurry, worst quality, low quality, normal quality, jpeg, (jpeg artifacts:1.2)
Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 3889540205, Size: 704x960, Model hash: 147c7bd9c6, Model: Protothing50, ENSD: 31337

Image:

21660-3889540205- lion_bird_girl_, 7, 10

Note

This was done with an earlier version of fusion. Indexes were off by one. They are one less on the readme for it to match the latest implementation. Protothing50 is a merge of protogen2.2 and anything3.0, weighted sum 50%.

PladsElsker commented 1 year ago

I just tried reproducing it, and it looks like I can't reproduce it either actually. This is embarrassing haha.

We'll investigate that.

PladsElsker commented 1 year ago

Okay, found the issue on my side. I was using clip skip 2 instead of 1. I can reproduce it now.

ljleb commented 1 year ago

I've updated the readme to use the correct step numbers that were used to generate the image. Note that the animation on the right does not display properly (the first linear interpolation between lion and bird is lacking an intermediate point).

vladmandic commented 1 year ago

I just tried using almost the same settings:

Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 5.5, Seed: 3889540205 Prompt: [lion: bird: girl:, 7, 10]
Negative prompt: (your long prompt)

I've used Model: Protogen v22 as I don't have Protothing at hand, but thats close anough and its reproducible with any model on my system anyhow.

And results are exactly as I've originally reported - it starts as a bird (which is a second term) and morphs into a lion (which is a first term) and never goes in a direction of a girl (maaaybe first few steps are the girl? hard to tell).

ezgif-4-89e64c84bd

(and i've tried one more time with all other scripts/extensions disabled, same result. plus updated prompt-fusion to latest as i noticed there is a fresh update)

vladmandic commented 1 year ago

and few more tests - i've disabled xformers and disabled all cross-attention optimizations (--disable-opt-split-attention) and results are vastly different - now with the same config it travels bird -> girl (so in the right direction, although no lion at the start).

then switched sampler from DPM++ 2M Karras to DPM2 Karras and its opposite - it clearly travels from girl -> lion (no bird as middle step). this is extremely finnicky and i cannot get a working combo.

ljleb commented 1 year ago

When I made the image, I was mostly fidling with the prompt. I later found out that the prompt itself was giving mostly images of birds as well, but this specific seed gave a cool result. You might wanna play a lot with the parameters to get more consistent results.

I think this might be a reasonable explanation of the situation you are facing. The step numbers of the first example have been cherry-picked based on prompt settings like seed, model, etc. I'd be very surprised if what you are experiencing was a problem with the code. Again, I suggest trying to reproduce the image integrally, as it may be easier to start from there and make sense of the effects of interpolation on the sampler.

As I suggested earlier, you could also try to play with the step numbers to get the results you are expecting. By introducing girl earlier while keeping the ratio between the step numbers close to what it is right now, you should eventually get an output that looks like what you want.

We could replace the first example of the readme with a prompt that has more chances of generating similar results in general. However, so far, with my own little experiments, it has almost always been more useful to add interpolations as an after-the-fact tweak to get a result closer to what I want, as opposed to starting with a prompt using interpolations.

You can get interesting and unexpected results by starting with interpolation in a prompt, but it's unclear to me how easy this is to control. In the end, I think one of the best analogies to prompt interpolation is a kind of smoother prompt editing.

One workflow I see may be a good fit for this is, once you have some prompt editing in place in your prompt, you can consider interpolation for finer editing control.

ljleb commented 1 year ago

For what it's worth, I just got 1 or 2 good outputs in 10 after 15 minutes of prompting, using close-up, face of a [lion: bird: girl:, .2, .4] as a prompt and keeping the seed random.

Here's one of the results I got:

ezgif com-gif-maker(5)

close-up, face of a [lion: bird: girl:, .15, .4]
Negative prompt: art by bad-image-v2-11000
Steps: 30, Sampler: DPM++ 2M Karras, CFG scale: 7, Seed: 430583269, Size: 512x512, Model hash: 0873291ac5, ENSD: 31337

In case you intend to reproduce this image: the model I used for this one is AbyssOrangeMix2_nsfw. You will find the bad-image-v2-11000 TI here: https://huggingface.co/Xynon/models/blob/main/experimentals/TI/bad-image-v2-11000.pt

vladmandic commented 1 year ago

thanks for all the notes. and lets ignore the complex prompts with multiple keywords/subjects. what puzzles me is that depending if a) xformers are enabled, b) which sampler is used i get results which are moving a-> b or a<- b for the same prompt.

ljleb commented 1 year ago

Xformers are notorious to make the generation process non-deterministic. IIUC you should get better results without them. I'm not sure why you are getting results that go in the opposite direction, I'd love to see the gif play out for this one to be honest 😅

Edit: ah the gif you shared above actually does this. excuse my oversight. I'll try to see if I can reproduce on my side.

ljleb commented 1 year ago

Alright, so I reproduced your gif. I step-debugged into the code and made sure every intermediate embedding was properly following the expected linear interpolation curve, by looking at the first dimension that was different in all 3 embeddings.

I know which embedding corresponds to which initial prompt because they are all passed together to the original get_learned_conditioning function:

image

Even though I haven't found any issue with the interpolation code, the image still seems to come back to lion after step 10.

My understanding of this is that at step 8, the intermediate noisy images have passed through enough lion-ish sampler steps that after step 10 (so step 11), it is hard for the model to steer in another direction. For the exact image you used, simply putting the girl step number after 8 (so 9) instead of after 10 (so 11) seems to fix the bias (although, the result is not what I'd call pretty).

vladmandic commented 1 year ago

i get what you're saying, but it makes getting any sort of predictable results pretty much a trial & error. i was hoping to avoid rendering separate image per-frame to get a quick animation, but atm i don't think i can...

ljleb commented 1 year ago

I understand the issue you have with this behaviour. To be honest, it's a bit annoying for me as well because I'd love to be able to use interpolation in the way you are envisioning.

I just made a test with [[lion : bird : 7] : girl : 11]. No interpolations, just basic prompt editing, and I found a similar behaviour to the one you were describing earlier (i.e. embedding time travel, memory of earlier embeddings). Not sure if it's a bug from the webui or a feature of the sampler, or something else. I used DPM2 a for this one:

ezgif com-gif-maker(6)

Also I tried deleting my local prompt fusion extension folder, restarting the ui completely and then generating this image again. I got the same result.

vladmandic commented 1 year ago

thanks for the detailed investigation. not sure if anything else can be done? if not, feel free to close the issue.