Closed Xeltosh closed 9 months ago
here the promised additional data. Tested merging it on AMD with 32GB of RAM, same error.
command line:
and here the error, the UI gives in one line on the right side of it: error message.txt
I'll take a look through the pruning logic. Keys that are in one model only should be pruned before merging and returned intact after.
Update: after testing around to pinpoint the error i also tried pruning the ingredients. pruned 2 of the 3 models and after that i could make my planned merge. Ok, i thought, you found the error but trying another different recipe after that still produced the error.
i am doing a sum_twice and i want to merge the result of that with another model.
though i saw something else right now......when merging and using sum_twice first, the cmd shows 3 models are getting loaded. when switching to weighted_sum after that, it still shows that it loads 3 models, even if only 2 are chosen. Restarting SD.next and doing weighted_sum it loads 2 models.
though even restarting the UI and doing every merge after another breaks after the first successful merge. can't really describe it, but even if there is an slight error in the model somewhere, it shouldn't lead to the whole merging process to be breaking down without any useable information(at least for me)
Also i don't know if it is connected to the issue, but every time i get 885 keys, the merging breaks
@Xeltosh Sorry its taken me so long, but would you be able to pull https://github.com/vladmandic/automatic/pull/2748 to see if that is sufficient to solve your problem?
code is merged in dev branch.
in case of sounding stupid, but i loaded the dev branch via Stability matrix and do i have to choose something specific? because everything the new code did was changing the merged result, though there was another update yesterday to it. did this somehow overwrite the code? the resulting model is different than before, but the error at 885 keys still happens.....
What commit is it showing when you first start the server?
02:38:05-010323 INFO Logger: file="C:\StabilityMatrix\Packages\SD.Next Web
UI dev\sdnext.log" level=INFO size=191754 mode=append
02:38:05-012318 INFO Python 3.10.11 on Windows
02:38:07-340443 INFO Version: app=sd.next updated=2024-01-25 hash=e924cc9e
To be fair, atm i can't seem to get the error in dev-branch, though i definitely had it once this afternoon, but it doesn't shows up atm. After the error appeared, i tried the merges i had done so far and found out, that the merge is definitely different than the same recipe done in the main branch, and sadly in my opinion, the resulting model got worse than being done in main. i tried pruning the model in dev-branch and merge again in main-branch. the resulting model is the one i want, sadly i can't merge anything on top, because the error still happens, strangely still at keys 885. tried just now after writing the message before, if i can get the error again and make a short movie maybe but the error doesn't appear anymore....
As apparantly no one besides me has this error, it is most likely, that the model my friend did has some kind of error in it. Would it help you, if i gave you a link to it? i would like to still have the old merging logic, but also my model to be fixed :/
i will try something different, maybe merging the model with itself in dev-branch could fix the model without changing the content?
update: merging itself in dev-branch doesn't fix the merging in main.
What i don't understand is: why can i do 1-3 merges before it breaking every merge afterwards.....
another update: after playing around with the merges done in dev-branch, i found out, that it works just fine and as good as the old ones (just different) and one or more of my embeddings "destroyed" the pictures i generated.
Though figuring out why there is a difference would still be nice, but is not that important anymore. Still thx for helping me @AI-Casanova
@AI-Casanova ok, another update: my friend trained a new model via kohya and i wanted to merge again. with the ReBasin active, i get the aforementioned error immediately again. i tried other models and i also couldn't merge them.
BUT as soon as i deactivate the rebasin, it works. when googling, someone mentioned in comfyUI it is a debug message and can be ignored, though dunno if it is completely the same: LINK and LINK
Soooo.... maybe a handler for that message is missing? Because as described before, the message only appears in the browser and apparantly is connected with the rebasin. the console just stops working without saying anything and waits for new input
sry for writing again, but i don't know if you get an notification in a closed issue @AI-Casanova @vladmandic
tried merging some other models and when keys appear at 883, the merging works, like before again. also installed a standalone version of SD.Next, because i thought that maybe stability matrix has a bug with the ReBasin, but sadly it's not. merging the models itself or trying to convert them and fixing the clip didn't work either
We resolved the issue, problematic tensors were cond_stage_model.logit_scale
and cond_stage_model.text_projection
I made a script in case someone encounters this as well and wants to "fix" their model:
from argparse import ArgumentParser
from safetensors import safe_open
from safetensors.torch import save_file
parser = ArgumentParser()
parser.add_argument("input", type=str, help="Input file")
parser.add_argument("output", type=str, help="Output file")
args = parser.parse_args()
# Load the model
model = safe_open(args.input, framework="pt")
tensors = {}
for key in model.keys():
tensors[key] = model.get_tensor(key)
# Remove broken tensors
del tensors["cond_stage_model.logit_scale"]
del tensors["cond_stage_model.text_projection"]
# Save the fixed model
save_file(tensors,args.output)
yep, can confirm! tried it on my models and i have 883 keys while merged and it ran through without error
my $0.02 without digging real deep, cond_stage_model.text_projection
sounds weird to start with as which encoder is it referring to? imo, it should be something like cond_stage_model.clip_l.text_projection
or cond_stage_model.clip_g.text_projection
we have no idea where these 2 keys come from. They are somehow in there after training. They either come from the used basemodel which was used while training OR they come to some buggy script in the training software. I tried merging several models before using Stax's script (without regarding the content compability). The keys in the command-line are the best indicator to see, if the model has these buggy things or not.
Some models seem to have them and only these specific models have the merging bug i encountered with ReBasin. As i shared before, every time i had 885 keys to merge, it bugged out, with 883 keys it worked.
The script from Stax removes these 2 "faulty" keys and i can say, that everything works so far as it is intended
Issue Description
I did some merging of some of my models i use and when trying to merge one of my already merged models, the UI gives me the in the topic mentioned error with an extremely long text (on the right side, which is only readable if copied and pasted in a separate file). The "bad" part is, in the console is no error visible. for the console, the merging just gets interrupted and stopped.
merging setup:
happened in weighted_sum and sum_twice
weightsclip enabled ReBasin enabled with standard seting of 5
I will post the rest of the error in the UI after i get home, because i need to do some formating, so it is readable (it's all written in one line)
My suspicion is, that there is some kind of minor error in one of the models i use, which only appears after merging, but i don't know how to fix this or find this error. Could it be an Error because of my "low" RAM? I will also trying to do the merge on my AMD system later which has more RAM.
I can provide the model informations if needed, though one is a sfw and nsfw model a friend of mine trained. The other one is on Civit.AI
Version Platform Description
WIndows 10 GTX3050 16GB DDR-4 RAM SD.Next is run via StabilityMatrix
Relevant log output
No response
Backend
Original
Branch
Master
Model
SD 1.5
Acknowledgements