Closed Lumoria closed 2 years ago
The weights don't need to add up to one. They just need to provide a distinction between different tokens generated from the prompt so the generator knows which ones to focus more on.
a mountain bear:120. big blue eyes.:0.5
A prompt of this nature would tell the generator to primarily focus on "a mountain bear" and once it has that it will apply "big blue eyes" in respect to the mountain bear in lesser strength.
Here are some variation changs.
a mountain bear:120. big blue eyes.:0.5
a mountain bear:120. big blue eyes.:30
a mountain bear:120. big blue eyes.:60
a mountain bear:120. big blue eyes.:100
a mountain bear:120. big blue eyes.:150
a mountain bear:120. big blue eyes.:250
As you can see, the eyes scale a bit while retaining the bear until the value crosses 120. The second the value crosses 120, the big blue eyes become the primary weight and the image starts to morph into something that looks more natural with that.
So by the time it hits 250 the weight of the big blue eyes is so strong that the mountain is an afterthought. And because more cats have blue eyes apparently compared to brown bears, we get this result.
The generated weights are relative rather than adding up to 1. They don't need to.
The issue may worsen based on the amount of weights being used.
complex_machinery_space_station_upper_atmosphere_sci-fi_by_amir_zand_by_yuumei_trending_on_artstation_pastel
complex_machinery:1 space_station:1 upper_atmosphere:1 sci-fi:1 by_amir_zand:1 by_yuumei:1 trending_on_artstation:1 pastel:1
These should essentially be the same prompt, but the one with weights is just garbled.
Underscores change way that the text is tokenized basically meaning that
complex_machinery_space_station_upper_atmosphere_sci-fi_by_amir_zand_by_yuumei_trending_on_artstation_pastel
and
complex_machinery space_station upper_atmosphere sci-fi by_amir_zand by_yuumei trending_on_artstation pastel
Are not the same though they shouldn't lead to such a big difference but still
Also since typically more attention is given to words earlier in the sentence if you give everything the same weight your increasing the weight of words later in the sentence as they would usually have less to begin with though I still believe order matters even in that case
Underscores change way that the text is tokenized basically meaning that
complex_machinery_space_station_upper_atmosphere_sci-fi_by_amir_zand_by_yuumei_trending_on_artstation_pastel and complex_machinery space_station upper_atmosphere sci-fi by_amir_zand by_yuumei trending_on_artstation pastel
Are not the same
Also since typically more attention is given to words earlier in the sentence if you give everything the same weight your increasing the weight of words later in the sentence as they would usually have less to begin with though I still believe order matters
So I swapped the complex machinery from the beginning with the pastel tag at the end and yeah, it seems to generate things that vaguely show complex machinery, so the order definitely matters. But all the other prompts are essentially thrown out in cases with larger amounts of weighting like this (which isn't really a lot). This is a real bummer, discodiffusion allowed weighting of way more individual tags while staying focused on everything mentioned. This is what makes me believe this weight system isn't working properly.
I also tried removing the underscores and I'm still getting similar results to with them, so while it may slightly change the results I don't think that's the main cause of my issue here.
you just have to adjust the numbers so if you want less pastel keep reducing until its the amount you want
you just have to adjust the numbers so if you want less pastel keep reducing until its the amount you want
The problem currently isn't really getting a little less or more of a certain tag, I'm not even there yet. I guess my question would be how do I apply weights to the weighted prompt that would equal the non weighted one (the very first result I posted)? The discrepancy between the two right now is just so far off I have no idea where to begin lol
pastel:1. space station:1. upper atmosphere:1. sci-fi:1. by amir zand:1. by yuumei:1. trending on artstation:1. complex machinery:1.
Prompt weights is sooooo complex. I am not sure we can actually understand the distribution of the token weights from the single line prompt... Clearly one does not split in the other with equal weights. I tried to use the prompt weight myself and pretty much gave up as it is soooo difficult to understand the complex interactions.
you just have to adjust the numbers so if you want less pastel keep reducing until its the amount you want
The problem currently isn't really getting a little less or more of a certain tag, I'm not even there yet. I guess my question would be how do I apply weights to the weighted prompt that would equal the non weighted one (the very first result I posted)? The discrepancy between the two right now is just so far off I have no idea where to begin lol
pastel:1. space station:1. upper atmosphere:1. sci-fi:1. by amir zand:1. by yuumei:1. trending on artstation:1. complex machinery:1.
I'll take a look at it, I've been trying to figure out a better way of doing weighting due to this issue, it's very easy for the weighting to ruin the image
Would be nice is there was a way to get the normalized weight output of single line prompt for each tokens as used by SD. This would help understand how they are applied and how to better influence them.
Maybe the other commit showing the tokes used from the prompt could be a starting point. No idea.
EDIT:
@xraxra just saw you are the one that proposed the token usage PR. Great work! Looking forward for the PR merge to use it. Knowledge is power.
This was PR #176 and it got pulled into main about 3 hours ago. There are a few suggestions for improving it, so it may have a subsequent PR soon.
OK... I did some experimentation with simple prompt vs weighted prompt... and my conclusion is the following:
simple line prompt is "adding" elements as you describe them to the picture... to an extent weighted prompts are "blended" results of each sub-prompts based on the associated weight of each... so it is not additive... it is averaging across all the prompts.
This is why you will NEVER get the same result between the two approach.
red apple, green apple prompt will tend to show a red apple an a green apple on the same picture... red apple:1 green apple:1 will tend to create and average resulting in a more orange apple with more of a paint look...
Hope this help
There's a new pull request that affects the weighted subprompts, and I wonder if you'd be willing to be added as a project collaborator and give it a quick review? I'm sending you an invite, in case.
Lincoln
On Tue, Aug 30, 2022 at 9:05 PM bmaltais @.***> wrote:
OK... I did some experimentation with simple prompt vs weighted prompt... and my conclusion is the following:
simple prompt id adding elements as you describe them to the picture to an extent weighted prompts are "blended" results of each sub-prompts based on the associated weight of each... so it is not additive... it is averaging across all the prompts.
This is why you will NEVER get the same result between the two approach.
— Reply to this email directly, view it on GitHub https://github.com/lstein/stable-diffusion/issues/143#issuecomment-1232331319, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA3EVLY3KHESUX7FPI5GTTV32VV7ANCNFSM57ZZUR5A . You are receiving this because you commented.Message ID: @.***>
-- Written on my cell phone. Anything that seems odd is the fault of auto-correct.
@Lumoria I've improved the weighting here locally, going to work on merging it after the simplet2i refactoring
@lstein with how the weighting works now, most of it lives in the FrozenClipEmbedder's forward function, within modules.py
I'm basically applying strengths to the vectors of each token in the prompt, rather than doing separate prompts and blending.
using the same prompt above, this is default (also if you did :1
after each word here, it would be exactly the same)
"complex machinery, space station, upper atmosphere sci-fi, by amir zand, by yuumei, trending on artstation, pastel" -s20 -W704 -H448 -C16.0 -Ak_euler -S4173332090
then I tweak some of the weights,
"complex machinery:1.2, :space station:0.5, :upper atmosphere:1.1, sci-fi:1.4, by amir zand:1, by yuumei:1, trending on artstation:1.6, pastel:1" -s20 -W704 -H448 -C16.0 -Ak_euler -S4173332090
and here is just pastel turned up a lot
"complex machinery, space station, upper atmosphere, sci-fi, by amir zand, by yuumei, trending on artstation:, pastel:1.6" -s20 -W704 -H448 -C16.0 -Ak_euler -S4173332090
Will it still be possible to combine this with the original subprompt weight? It was a nice feature to be able to blend many prompts by weight. Having the ability to use both would be great. Losing the original subprompt behaviour would be a loss.
Will it still be possible to combine this with the original suboromot weight? It was a nice feature to be able to blend many prompts by weight. Having the ability to use coth would be great. Losing the original suboromot behavior would be a loss.
Yea it is technically possible, just would need a good way to specify what is a prompt and what is token weighting.
Maybe it could work like this, where subprompts are specified by quotation marks, I'll have to see if that would break anything with how it grabs the prompt currently.
"some: cool:1.5 prompt that: also:1.2 has weighted tokens":1.2 "a second: nice:1.1 prompt":0.2
This would be awesome as I have been using the subprompt as a way to blend multiple long prompts together like in the HOWTO I wrote to craft outputs: https://github.com/lstein/stable-diffusion/issues/359
Using quotes would be a nice way to mark subprompts. Actually... this is how I was actually writing my long subprompt ;-)
If you can pull it off this will be super powerfull! Push individual prompt tokens strenght and blend multiple prompts with token weight inside. Ouff... talk about ultimate power to craft prompts.
I am looking forward to test this out. Do you have a link to the repo/branch where I could try it and provide feedback?
@bmaltais I have a branch of the WIP version here, https://github.com/xraxra/stable-diffusion/tree/token-strengths
it doesn't have the prompt weighting added back in yet, and I'd like to simplify how the strengths get applied to tokens
"complex machinery:1.2, :space station:0.5, :upper atmosphere:1.1, sci-fi:1.4, by amir zand:1, by yuumei:1, trending on artstation:1.6, pastel:1" -s20 -W704 -H448 -C16.0 -Ak_euler -S4173332090
Just tried the branch and I don't think it contain the latest code as running the prompt for one of the examples you provided above produce a totally non related result... much more like what it used to be in the original release:
Prompt: "complex machinery:1.2, :space station:0.5, :upper atmosphere:1.1, sci-fi:1.4, by amir zand:1, by yuumei:1, trending on artstation:1.6, pastel:1" -s20 -W704 -H448 -C16.0 -Ak_euler -S4173332090
Result:
Also... I tried to figure out where the weird token display is coming from when using -t... but commenting out all the locations in the code where this could happen still does not make it go away... not too sure what is going on ;-)
I should have responded to this earlier. I'm a strong believer in making things that do different things look different. I've also had trouble in the past figuring out where the beginning of a weighted subprompt is. Riffing off @xraxra's quotation mark example, a format that uses parentheses looks more easily understandable to me:
(some: cool:1.5 prompt that: also:1.2 has weighted tokens):1.2 (a second: nice:1.1 prompt):0.2
This is a step forward, but as a naive user it's unclear to me how to interpret tokens that have colons but no weights. In this example, what is the difference between "some:" which ends in a colon and "prompt" which doesn't? This version seems more natural:
an alien:1.2 spacecraft soaring through (cloudy:1.1 rain-streaked skies):1.5
I'd interpret the subprompt "an alien spacecraft soaring through" as having a weight of 1, the subprompt "cloudy rain-streaked skies" as having a weight of 1.5, and the individual tokens "alien" and "cloudy" having token weights of 1.2 and 1.1 respectively. Semantically, does this make sense?
Another question that arises is whether a whole word always corresponds to a token? I thought that longer words were broken into multiple tokens by the CLIP tokenizer.
Lincoln
On Mon, Sep 5, 2022 at 7:19 AM xraxra @.***> wrote:
Will it still be possible to combine this with the original suboromot weight? It was a nice feature to be able to blend many prompts by weight. Having the ability to use coth would be great. Losing the original suboromot behavior would be a loss.
Yea it is technically possible, just would need a good way to specify what is a prompt and what is token weighting.
Maybe it could work like this, where subprompts are specified by quotation marks, I'll have to see if that would break anything with how it grabs the prompt currently.
"some: cool:1.5 prompt that: also:1.2 has weighted tokens":1.2 "a second: nice:1.1 prompt":0.2
— Reply to this email directly, view it on GitHub https://github.com/lstein/stable-diffusion/issues/143#issuecomment-1236866150, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAA3EVOHH2JY3SSJVNJCCQLV4XJLFANCNFSM57ZZUR5A . You are receiving this because you were mentioned.Message ID: @.***>
--
Lincoln Stein
Head, Adaptive Oncology, OICR
Senior Principal Investigator, OICR
Professor, Department of Molecular Genetics, University of Toronto
Tel: 416-673-8514
Cell: 416-817-8240
@.***
*E*xecutive Assistant
Michelle Xin
Tel: 647-260-7927
@. @.>*
Ontario Institute for Cancer Research
MaRS Centre, 661 University Avenue, Suite 510, Toronto, Ontario, Canada M5G 0A3
Collaborate. Translate. Change lives.
This message and any attachments may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this message in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this message may not be that of the organization.
All of my generations seem to completely break and become messes of noise once I start adding weights to prompts. I've tried summing the weights up to 1 manually (with and without -x) and they still come out messed up. Furthermore, when equally weighing each part of the prompt with the same exact value there's just noticeably way more noise and disarray in the results, which wouldn't be present with the same prompt with no weights specified.