Open aleksusklim opened 7 months ago
Also, how about a checkbox in EM tab named Save converted SD1/SD2/SDXL versions in separate files
, that will:
/embeddings/embedding_merge/SDXL/%name%.safetensors
with zero G but 768 of L/embeddings/embedding_merge/SDXL/%name%.safetensors
with zero L but 1280 of G/embeddings/embedding_merge/SD1/%name%.safetensors
with 768 of L and /embeddings/embedding_merge/SD2/%name%.safetensors
with 1280 of GThis way you would be able to convert between different embeddings by checking this flag and loading a correct base model to switch mode!
Can anybody explain me why I get embedding hidden size for SD2 as =1024 but for SDXL part G as =1280 ? For SD1 and SDXL part L it is =768 as expected.
EDIT: realized that |
also could be used instead of +
so instead of merging two sentences targeting both clip it merges two sentences that target one clip each. That way it will not interfer with the single quoted part like the example below might do.
What about if just adding |
inside single quotes would make it so anything before it targets CLIP L and anything after targets CLIP G and that it also changes position of math operations and such to the left side of the singe quotes for CLIP L and right side for CLIP G.
Would probably be good to add a character to the syntax that indicates that any character after it that is appended by a numerical value are math operations and '
>
and +
without appended numerical value assumes their default functions. In the following example the #
is the indicator that there is math operations.
The sentence a blue dog
is only acting on CLIP L and is multiplied by 1.25 white a red dog
is only acting on CLIP G and is divided by 0.8,
then they are merged with a green dog
since the +
after the last math operation is not followed by a number and that sentence targets both CLIP and is multiplied by 0.5. Does this seem reasonable @aleksusklim ?
<#*1.25'a blue dog|a red dog'#/0.8+'a green dog'#*0.5>
Separated prompts for two different text encoders seem unnecessary. Separated prompts for the base model and refiner may work, but the effects are random, and we refrain from implementing this.
Also this statement about separately prompting clip that fooocus maintainer wrote can be dismissed. I have proof that under the right circumstances, separately prompting the clip models can provide significant improvement. I have done extensive experiments on this.
<#*1.25'a blue dog|a red dog'#/0.8+'a green dog'#*0.5>
I don't understand this. Firstly, any runtime merge expression ought to start with single quote, otherwise it won't get parsed (and will mess up with other extensions if I'd try to interpret it), so the only valid start is <'
or <'',
Secondly, you seem to include a control character |
inside single quotes. This is wrong, because currently there are no prohibited symbols inside quotes (actually, even the single quote itself can be freely used: to do this you'll have to double it, for example cat's
should be <'cat''s'>
; I don't see this documented anywhere in the docs, but it was possible from the very beginning!)
|
also could be used instead of+
so instead of merging two sentences targeting both clip it merges two sentences that target one clip each
Show some examples, and note that I cannot delay multiplication for anything but the directly preceding term, so we cannot have "multiplication from left" like X*'S'
, but only 'S'*X
I have done extensive experiments on this.
Where, with what software? (Comfy, Diffusers?)
I realized the |
issue inside single quotes hence the edit. That is why in edit I hinted towards another method.
<'a blue dog'#*1.25|'a red dog'#/0.8+'a green dog'*0.5>
'a blue dog'#*1.25
would represent the CLIP L part
|
would indicate that the single quoted to the left is L and to the right is G
'a red dog'#/0.8
would represent the CLIP G part
+
would function as normal (in this case the L and G parts to left that are merged with different tokens but at the same location in prompt will merge with the one on right that have same tokens on both clip)
'a green dog'*0.5
is created with both CLIP.
The #
would indicate single CLIP operation and unless there is a presceeding |
then that CLIP is L. If there is a |
presceeding prompt then it is CLIP G and will be merged as such
In the case of only wanting one CLIP to and other to be padded with zero then you would just leave that single quote empty followed by only a#
followed by |
if CLIP L or one of the following if CLIP G: +'
if more merges are being done or >
if nothing else. Note that padding should be done to the same token amount as the one that is not padded.
<''#|'a red dog'#/0.8>
<'a blue dog'#*1.25|''#>
Confusing.
Couldn't you just |'string
to indicate it as L and #'string
to indicate it as G, at that rate?
Confusing.
Couldn't you just
|'string
to indicate it as L and#'string
to indicate it as G, at that rate?
You are right. I do tend to overcomplicate some things.
As long as |'string
if used alone also does torch.zeros
on G and #'string
if used alone also does torch.zeros
on L it should be fine i suppose.
Give several examples how you would use this, especially if you told that you already have experience in messing with two separate prompts?
Well the influence over image is not equal between the two CLIP models but by multiplying the magnitude of embedding only using L CLIP this can be overcome and since L CLIP is same as SD 1.5 CLIP it has all the openai training still there. I have already used this but in a workaround manner by creating embedding with SD 1.5 model and then convert them to work with SDXL by zero padding G. If you check the Abs parameter when parsing you can see that G value is consistently higher than L. Even these out and prompt coherence goes up as well
So you actually need a separate multiplication?
Like *L1.7
and *G0.8
instead of just *1.7
and *0.8
?
This way, to get pure L you will just 'string'*G0
Would that be enough?
Yes that sounds great. It makes sense too since if you are going to target only one clip you would want to use multiplication in order to compensate a bit. At least from my own experience.
Also here are three embeddings that were converted from SD 1.5 to SDXL with the padding technique if you want to check them out for effectiveness, parameters and such: xlconverted.zip
By chance, maybe you know why G part is not compatible with SD2 ? I thought there is OpenCLIP in both SD2 and SDXL.
Because the OpenCLIP model used by SD 2.0-2.1 is not G. I believe it is H and the hidden dim size of G is 1280 while H is 1024. Below is screenshot of each text encoders configuration file
I've pushed two changes:
'a cat'/2G
, 'test'*1.5L
. Only literal uppercase "L" and "G" are allowed, directly after the number. To keep only L vectors you should do *0G
The documentation is not updated yet. Can you test everything and make sure it is working as you might expect, and that nothing got broken?
Everything seemed to be working well but at one point, whatever I put in negative prompt became positive instead for some reason. Gonna investigate it some more. Been doing all kinds of crazy stuff though so it does seem to be working overall
So yeah things are working as they should. One suggestion though is in addition to placing the safetensor converted embedding when saving is to add a suffix to it since without that, sdxl embedding sharing same name as sd15 embedding will not show up in extensions such as tag autocomplete but instead shows just as the sd15 version. I have gotten used to naming mine with suffixes '_xl' and '15', but something like 'vXL' and 'v1' would be more clear.
Why to use a prefix if you naturally cannot have loaded both SD1 and SDXL versions in WebUI at the same time?
Because. When an SDXL model is loaded the extension a1111-sd-webui-tagcomplete is unable to differentiate between the two since it is only used for aliasing and quick acess to embeddings, loras and such through prompt. So if two embeddings has the same name, it then displays it as a SD 1.5 embedding. In image I have an SDXL model loaded, I am using extension in prompt while displaying the actual available SDXL embedding and you can see that the one with the exact same name is displayed as v1 Embedding even though there obviously is a XL one available. That is cause that extension is not meant to do checks for loaded model or anything like that. It is just performing aliasing and prompt shortcuts for embeddings and extra networks. You will have to excuse the name but it is the only one that was left that I had not suffixed. Hope this explains it. Otherwise I suggest you check out the extension I mentioned so you get first hand experience. The extension
the extension a1111-sd-webui-tagcomplete is unable to differentiate between the two
And so what? The embedding is there, and it will be used in generation.
It is just performing aliasing and prompt shortcuts for embeddings and extra networks.
That extension should not list embeddings that are not compatible with the current model, because this is a lie that they are usable: WebUI would not throw any errors but instead will take the name literally as text, without substitution.
Showing the wrong type of the embedding because of duplicated name is not a bigger lie!
I have gotten used to naming mine with suffixes '_xl' and '15', but something like 'vXL' and 'v1' would be more clear.
Why to rename them, if it would be convenient for prompts to keep general names of embeddings which would allow you to swap models without changing the prompt?
For example, if your SDXL embedding of a furry dog boy is catgirl1
and you have its L part stored as catgirl1
too, then your prompt would work regardless of what the current model is, SDXL or SD1.
Yeah I will just head over to that extensions repo and ask them to change their entire way of fetching embedding/extra networks names.
I have 2600 embeddings. If I would have same name on both xl and v1 variants, currently it would just display as v1 in that menu and I would be clueless to know if that is one that has one for each architecture or if it is one that I have yet to convert. So no there is no convenience by having them being named the same in that context. I would however understand the convenience for casual users that does not use EM for constructing highly complex embeddings through multiple intermediary steps like I do.
Yeah I will just head over to that extensions repo and ask them to change their entire way of fetching embedding/extra networks names.
You may backlink here when you do; meanwhile I will be updating my docs for the new syntax…
The tagcomplete issue has been resolved.
By the way, my PR has been merged to webui dev branch. It is now possible to unlock clip skip option for clip L when using SDXL which can bring some benefits, especially if combined with prompt editing timelines and this extension. Link to the pull request if you want to take a look.
How merge expression syntax could be enhanced to incorporate an independent manipulation or L (CLIP as in SD1) and G (OpenCLIP) clips of SDXL?
Currently
<'cat'*2+'1girl'>
will:What do we want:
L*2
butG*1
; orL*0.3
andG*0.7
)What we cannot have:
+
/-
, or do*
//
/:
right away, operating only on two internal variables ("left" operand and "right" operand:*
doesright=right*this
and+
doesleft=left+right; right=this;
)A few ideas:
<'use clip'*1.4 | 'this is OpenCLIP'*0.5>
<'this is OpenCLIP'*0.5:G + use clip'*1.4:L>
('X':L will zero-fill G-part of 'X'; read as "use L" )Also see https://github.com/klimaleksus/stable-diffusion-webui-embedding-merge/commit/a89dde64440f24a872231259304e148d2258da8b#commitcomment-140709559