Closed DirtyHamster closed 10 months ago
@vladmandic I decided to run without restarting the pf32 pruned version at iteration 10 it ran without problem RAM peak was 31gb. I started it with the OS and programs using 9gb RAM when it ended running I was at 5gb RAM. The OS seems to have put other things into a lower mem mode correctly. I thought about it and figured this was a better stress test. I'm going to try to run the rest of the 9 iterations off tomorrow as well as play with the -prune -pr arg on the full model with ema in it though I'll probably just run that at 10.
@DirtyHamster I'm in touch with axsddlr and I know they are going to work on it. May cook something up in the meantime but I rather have the gui separated from the library repository
@vladmandic Results from the fp32 iteration cycle:
I don't notice as much of a difference in these as I did in the fp16's, I'm kind of surprised. It could just be an effect of the extra accuracy of the model though. Next up the prune test and the fp16xfp32 tests.
@s1dlx I think keeping them separate is fine. No need to trouble yourself, especially if axsddlr already working on it.
2. not yet. I'm playing with an automatic checker to stop iterating when weights do not change anymore. I tested at different values (10, 50, 100) and I've got different models every time. They were all OK, so not sure how easy is to pick a number of iterations. In the paper they reach ~250 iterations
When I was looking at just the 1-10 I kind of thought it would be hard to pick too. Was your value runs going to 50 and 100 much different than the runs ending at 10? I like the idea of an automatic checker too.
3. It's technically possible but not implemented (PRs are welcome :) )
It was mostly a question of curiosity as I realized how many iterations they had done. I could imagine myself at 245 bluescreening. lol
@vladmandic results from the mixed fp32 x fp16 models outputted as fp16:
I think this might be one of the best runs yet when looking at the comparison from what we currently use in increments from 0.1 to .09 at least on the cat prompt. I did pull an update for meh so I'll be double checking the previously generated merges just to see if that had anything to do with it.
The prune test went fine it I just didn't find that necessary to do outputs for. It can pass models with their ema that way not that people should be retaining their models ema's after merging. That does get rid of the potential for error issues though.
I still have to do time tests and get a closer watch on hardware usage but I'll do those at iteration 10 at fp16 and fp32 while I do that double checking that I mentioned above.
hmm, why are all the results looking the same? there was one great set of results 5 days ago, since then i'm at a bit of a loss?
@vladmandic That's probably because the last few posted are the git-rebasins iteration outputs focused on comparing the traditional 0.5:0.5 merge vs RB method using .5 at iterations, so all the changes are a lot smaller. You have to take the set that you just mentioned look at the model A full, 0.5, and model B full and then compare based on that to the iterations across the re-basin merge which is done at a default .5 at rbi {1,2...10}.
I mentioned that it had to be 0.5's here: https://github.com/vladmandic/automatic/issues/1176#issuecomment-1586410540 though I didn't go into great detail. Initially I had tried running 0.9:0.1 weighted and ran off some iterations then went to 0.8:0.2 at iteration. I also tried using smaller increments 0.95:0.05... Every single one that was not 0.5:0.5 of those attempts ended up with model a being shown first and second image then model b being the rest of the images. Then I re-read the paper realizing it's just trying to get the best 50/50 mix. Which is still really useful though not as variable as I initially thought it would be but still extremely useable in a merge regiment.
I've been gathering the individual results up at the bottom of the first message so they are not as randomly positioned across the msg thread. I have to fill a few spots that were individually generated with grids though still. I'll bring a copy of the gathered results down here after I have all the results together. So it's easier to refer back to in discussion.
ahh, serves me right when i read updates on my phone just before going to bed. thanks for setting me straight. and yes, i do remember previous conversations :)
No worries... I do that all the time between having multiple projects going and insomnia I can be all over the place somedays too. That's also why I want to double check everything. The last run had me wondering why the cat arms are less human like?
This was the mixed fp32xfp16 outputed as fp16 set (normal merge):
base-fp32+custom#1-fp16, base-fp32+custom#2-fp16
@vladmandic On my todo list: Just so we're on sync on what I think I can get done in the coming week.
Over Weekend + Monday: re-output grids for: a headshot photographic portrait of a woman base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 Double check test 1,2 on re-basin side. Bring results down for discussion:
Coming week: Looking for a good hardware logger with selectable fields (mine doesn't have selectable fields for logging): Suggestions are appreciated? Stopwatch time trials at iteration 10 Closer look at Vram and Ram usage at iteration 10
Extras & some speculation:
I want to try the following with stopwatch times (These can only get done if I have time to babysit the computer though.) Attempt iteration runs at 15,20,25,30 or 20,30,40,50: Suggestions are appreciated? Though my intent is not to go to 250.
This extra is pretty much just to get a better handle on iterations and the permutation outcomes at values beyond 10. However I do suspect that models might be closer together then expected based on cost of training and quantity training start points against the current hardware requirements for training. So shorter iteration values might be more plausible now than when training on a home PC becomes more readily available. So iterations might go up over time as availability home training become more available.
Note 1: meh had an update to 0.7.0 Partial reason for some of the double checks. They added some other interesting options too as auto presets for block weights have a look: https://github.com/s1dlx/meh/wiki/Presets @s1dlx Can you give us any more insight into what and how was implemented in the addition for this?
Note 2: Model architecture: Would it be possible to do as an option Add in as a simple action modular architecture: inpaint, pix to pix... from looking at model toolkit https://github.com/arenasys/stable-diffusion-webui-model-toolkit I think these could be stored as component defaults off to the side and just be added in pre-final-save. Focusing on wider range usability here when looking at what is commonly asked for on model sharing sites. This is much an after thought though and out of scope is a fair answer.
Thoughts and input are very welcome. I always expect delays and am not in a hurry btw.
Looking for a good hardware logger
What's the use-case you need it for?
I want to try the following with stopwatch times (These can only get done if I have time to babysit the computer though.)
Why do you need to babysit the computer through it?
Attempt iteration runs at 15,20,25,30 or 20,30,40,50: Suggestions are appreciated? Though my intent is not to go to 250.
Don't know if this uses linear interpolation or is it just linear interations as-is? If its using interpolation, 99 is pretty much highest anyhow.
auto presets for block weights
Those look really interesting from the theory perspective.
Would it be possible to do as an option Add in as a simple action modular architecture
Not sure I follow.
Thoughts and input are very welcome. I always expect delays and am not in a hurry btw.
My $0.02 - we should probably define what are exit criterias?
What's the use-case you need it for?
I figured I could log the runs at iteration so you could see usage acrose, cpu, ram, vram...
Why do you need to babysit the computer through it?
I have no current way unless taught a way to monitor the starting and stopping dependent on the runtime.
Don't know if this uses linear interpolation or is it just linear interations as-is? If its using interpolation, 99 is pretty much highest anyhow.
It assesses the weight deviance between the two models at the start and end each epoch iteration as far as I understand, I wouldn't consider the method as linear as a function could call a change but a permutation of the last run is based on it's last output figures. The paper goes to iteration measuring the weights and estimation methods. As there is no back checking across previous iterations it should get better or degrade across iterations. A user should be able continue the iterations unless manually stopping or if that estimation reaches a point of no change? Very brief chat about it here for an auto stop function up a few posts: https://github.com/vladmandic/automatic/issues/1176#issuecomment-1588762518
Not sure I follow.
A base model needs 3 parts to work via (V1):
SD-v1: --VAE-v1 ---VAE-v1-SD (baking would replace this) --CLIP-v1 ---CLIP-v1-SD --UNET-v1 ---UNET-v1-SD
Things we could also offer to add in: UNET-v1-Inpainting, UNET-v1-Pix2Pix, SD-v1-ControlNet, UNET-v2-Inpainting, UNET-v2-Depth. These are just components that can be added to a base model. I think these could be added in potentially as easily as we do a clip fix though.
My $0.02 - we should probably define what are exit criteria's?
We never defined that, but to me it's ok either way I'm just not sure where, I should just suspend for a decision? Should this be just limited to a better 50:50 merge or do we want it to be more usable across the user base as a whole? I'm just not sure if we want to cover things like model architecture inputs and stuff like that as mentioned above. Presets to weight I think is fair.. I'm not sure what is too far, if you know what I mean at the same time I know what is useful.
Do we have any exit for end of scope suggestions on this first improvement attempt?
For me it would be to take most of it to a logical end unless extra work would hold back the improvement.
@DirtyHamster presets are simply lists of 26 floats you give to the merger for doing block merge. They are implemented in presets.py
regarding logging, meh already logs vram for rebasin
in the dev branch there’s some initial proper logging added. That would make it in release 0.8
also, a lot of the experiments you are doing are being also done in our small discord server. Perhaps you can join and compare results
experiments you are doing are being also done in our small discord server
Sure what's the discord address?
just an idea of exit critera:
experiments you are doing are being also done in our small discord server
Sure what's the discord address?
I’ll add it to the readme but it’s the same we have for Bayesian-merger
just an idea of exit critera:
- does it work? yes.
- how does it compare to existing basic merge?
- benefits? (from user perspective, this mostly comes down to visual quality?)
- functionality? (is it only 0.5+0.5 or can it do other?)
- performance?
- requirements? (vram)
- what are the best defaults?
- which tunables should be exposed? (that create value for users)
This is fair.. I think a lot of these are a yes but lets look at it after the double checks an result fill ins. I'm serious enough to say I would use this for most of my own 50 percent merges regardless if you add it in. Most of the presets can probably be left out though it would be nice to have options to include them.
Sure what's the discord address? https://discord.gg/X8U6ycVk
I'll take a look. I don't normally use discord.
@DirtyHamster presets are simply lists of 26 floats you give to the merger for doing block merge. They are implemented in presets.py regarding logging, meh already logs vram for rebasin in the dev branch there’s some initial proper logging added. That would make it in release 0.8 also, a lot of the experiments you are doing are being also done in our small discord server. Perhaps you can join and compare results
I'll take a closer look, but I want more specs than just vram. i,e ram cpu temps...
Having a pull down menu with presets is not a problem, no matter how many there are. Having 10+ different checkboxes or number ranges is not.
Regarding logger, if you want low level, GPU-Z has a built-in logger and nothing beats it's sensors. Even if you don't like built-in logger, I'd suggest to search for something that uses their sensor data.
@vladmandic Will check it out I'm doing this between a lot of yard work so I might crap out a few days coming I have around 2 tons of small stone to move that just got dropped off yesterday and a deck to fix.
All the fields available via the, current help listing is this for meh:
Usage: merge_models.py [OPTIONS]
Options: -a, --model_a TEXT -b, --model_b TEXT -c, --model_c TEXT -m, --merging_method [add_difference|distribution_crossover|euclidean_add_difference|filter_top_k|kth_abs_value|multiply_difference|ratio_to_region|similarity_add_difference|sum_twice|tensor_sum|ties_add_difference|top_k_tensor_sum|triple_sum|weighted_subtraction|weighted_sum] -wc, --weights_clip -p, --precision INTEGER -o, --output_path TEXT -f, --output_format [safetensors|ckpt] -wa, --weights_alpha TEXT -ba, --base_alpha FLOAT -wb, --weights_beta TEXT -bb, --base_beta FLOAT -rb, --re_basin -rbi, --re_basin_iterations INTEGER -d, --device [cpu|cuda] -wd, --work_device [cpu|cuda] -pr, --prune -bwpa, --block_weights_preset_alpha [GRAD_V|GRAD_A|FLAT_25|FLAT_75|WRAP08|WRAP12|WRAP14|WRAP16|MID12_50|OUT07|OUT12|OUT12_5|RING08_SOFT|RING08_5|RING10_5|RING10_3|SMOOTHSTEP|REVERSE_SMOOTHSTEP|2SMOOTHSTEP|2R_SMOOTHSTEP|3SMOOTHSTEP|3R_SMOOTHSTEP|4SMOOTHSTEP|4R_SMOOTHSTEP|HALF_SMOOTHSTEP|HALF_R_SMOOTHSTEP|ONE_THIRD_SMOOTHSTEP|ONE_THIRD_R_SMOOTHSTEP|ONE_FOURTH_SMOOTHSTEP|ONE_FOURTH_R_SMOOTHSTEP|COSINE|REVERSE_COSINE|TRUE_CUBIC_HERMITE|TRUE_REVERSE_CUBIC_HERMITE|FAKE_CUBIC_HERMITE|FAKE_REVERSE_CUBIC_HERMITE|ALL_A|ALL_B] -bwpb, --block_weights_preset_beta [GRAD_V|GRAD_A|FLAT_25|FLAT_75|WRAP08|WRAP12|WRAP14|WRAP16|MID12_50|OUT07|OUT12|OUT12_5|RING08_SOFT|RING08_5|RING10_5|RING10_3|SMOOTHSTEP|REVERSE_SMOOTHSTEP|2SMOOTHSTEP|2R_SMOOTHSTEP|3SMOOTHSTEP|3R_SMOOTHSTEP|4SMOOTHSTEP|4R_SMOOTHSTEP|HALF_SMOOTHSTEP|HALF_R_SMOOTHSTEP|ONE_THIRD_SMOOTHSTEP|ONE_THIRD_R_SMOOTHSTEP|ONE_FOURTH_SMOOTHSTEP|ONE_FOURTH_R_SMOOTHSTEP|COSINE|REVERSE_COSINE|TRUE_CUBIC_HERMITE|TRUE_REVERSE_CUBIC_HERMITE|FAKE_CUBIC_HERMITE|FAKE_REVERSE_CUBIC_HERMITE|ALL_A|ALL_B] -j, --threads INTEGER --help Show this message and exit.
To give you an idea of my inputs for testing:
Basic re-basin only seems to require the following: merge_models.py -a H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\HD-22-fixclip-noema-fp32.safetensors -b H:\Users\adamf\AI_Progs\AI_Models\Stable_Diffusion\dreamshaper5_Bakedvae_fp16-noema.safetensors -m weighted_sum -p 16 -o H:\Users\adamf\AI_Progs\AI_Models\test5\0-merge000001_10.safetensors -f safetensors -ba 0.5 -bb 0.5 -rb -rbi 10
What s1dlx would give us a lot more options regardless if re-basin is better or not though I think it is better for 50:50's.
On a side note is it possible to use the default file explorer to select the model rather than a drop down in the ui when select ing the main model's, vaes, or other components? I have so many so being able to sort them is an issue, I don't know if anyone else is this crazy with collecting them.
I'll take a look at GPU-Z later tonight too. Have to get back to work for a bit.
On a side note is it possible to use the default file explorer to select the model rather than a drop down in the ui when select ing the main model's, vaes, or other components?
whatever you select via explorer needs to be validated and matched to some known entry so that exact entry can be used moving forward. for example, server knows about models it enumerated - trying to select something it doesn't already know of is a mess. definitely doable, but non trivial.
regarding params, i just noticed one thing - i need to take a look at the code if git-re-basin allows passing of actual device instead of string (cpu
or cuda
) as sdnext itself supports quite a few other backends and if i integrate this, i cannot say, oh this is cuda-only feature.
I haven't tried the swap arg on device since version 4 at that point I couldn't get the option arg to work. I think by default it's set to cpu.
I found that the drop-downs are a little messy to find stuff in during testing. It's 10x of any model involved. So it just has me thinking can we do sort orders and stuff. I think I'm beyond normal use case scenarios but it's still something to think about UI wise.
meh cli tool has that many arguments but the library is much simpler. Basically all the presets stuff is not included as those simply override “wa” and “ba” (and the beta ones).
on the cuda/cpu side…I imagine you can change cuda with “gpu” and get the same result on amd cards
an example of how to use the library is given by the sd-webui-bayesian-merger extension
the idea of the library is to be a generic merging one, not just a tool for rebasin
@s1dlx
meh cli tool has that many arguments but the library is much simpler. Basically all the presets stuff is not included as those simply override “wa” and “ba” (and the beta ones).
I meant that as in what fields might be needed in the UI wise not that it was over complex or anything. I think what you have going on in your repository is much better then the basic one that we are using currently.
@vladmandic I'll probably get back to testing tomorrow. Other than the existing last few add-ins for compare the only other potion I can test between the current method and meh is add difference. Should I run tests on that too?
Maybe just a quick test to see if anything deeper is really needed?
@vladmandic
Maybe just a quick test to see if anything deeper is really needed?
I'll do a light testing on it while finish up the other stuff. I want to do the time trials too. I haven't found any issues yet that would concern anyone so it's probably safe to start looking at the code to incorporate it. You kind of make out like a bandit with all the extras that are packed into the repository.
I'll run this off too:
btw, totally off-topic, since you've already mentioned lyriel model before (and that's one of my favorites, i'd be curious (and this is just personal) how does it merge with another something like https://civitai.com/models/18798/meinaunreal
I'll do it as 0.1 to 0.9 same as in the tests. Haven't forgotten about it.
Finished outputting the fp32 merges that I have to re-output, about to start working on the fp16's.. I really shouldn't have deleted them lol... After thoughts are great right...
Just figured do a small status update and noticed the time clock from the last merge as it's still running from last night. Thought you might find it amusing as it's the only thing I've managed to break...
In about an hour or 2 and I'll have the rest of the results out. The extended testing is still to come though.
@vladmandic basic testing is done will get on with the extras:
Results so far for discussion if necessary: Model 1: used is: https://huggingface.co/Deltaadams/HD-22 fp32 Model 2: used is: dreamshaper_5BakedVae.safetensors via: https://huggingface.co/Lykon/DreamShaper Both models pruned from full trainable ema models to fp32 no ema and fp16 no ema prior to testing.
Testing method sampler and size settings: Settings: DPM++ 2M Karras @ 20 steps and a CFG scale of 7, Seed: 1897848000, Size: 512x716, CLIP: 4 Prompts Used: a headshot photographic portrait of a woman, a cat as a DJ at the turntables
Testing regiment: (Multiplier to be run from 0.1 to 0.9)
base-fp16+custom#1-fp16, base-fp16+custom#2-fp16
base-fp32+custom#1-fp32, base-fp32+custom#2-fp32
base-fp32+custom#1-fp16, base-fp32+custom#2-fp16
Re-git-basin side will be similarly mirrored: (Weight value set at .5:.5, iteration value to be run from 1 to 10)
Test1: base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 @ weight: .5:.5, iteration {number set...}
Test2: base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 @ weight: .5:.5, iteration {number set...}
Test3: base-fp32+custom#1-fp16, base-fp32+custom#2-fp16 @ weight: .5:.5, iteration {number set...}
@vladmandic The lyriel x meinaunreal base merge set is finished do you have a favored prompt that you would like for me to use with them. I can do them across multiple vae, if you'd like too. I still have to finish up the re-basin for them. Just figured I'd ask first before posting outputs.
Looking at https://openhardwaremonitor.org/ and https://www.techpowerup.com/download/techpowerup-gpu-z/ for the hardware testing portion at the moment. Other suggestions are welcome.
@DirtyHamster naah, use whatever you want.
@vladmandic Just figured I'd ask first. I have one in mind.
@vladmandic Outputs from the merge plus I found a funny bug that I can't seem to replicate,,
Prompts and settings on this:
Steps: 32, Sampler: DPM++ 2M, CFG scale: 12.5, Seed: 1897848000, Size: 716x512, Model: {x}, VAE: {y}, Clip skip: 3
prompt: "ultra-high detail (HDR:1)" 8k (realistic:1.5) (masterpiece:1.5) (photorealistic:1.5) (photorealism:1.5), "daringly cinematic", "viscerally exaggerated", high quality professional photograph with "good depth of field" of a (realistic "woodland rocky brook with a ("small waterfall")") and fog in background at ((sunset) "colorful sky"),
negative prompt: watermark, signature, "lowres", "bad quality", "low quality", "lowest quality", "worst quality", blurry, pixelated, drawling, sketch, sketched, painted,
For this little but of the side testing I thought to use water because it's just such an easily acknowledged abstraction in it's many forms. Also so from some of our other little chatter I just figured I'd run them all the vae's I've come across extracted or otherwise to see what the change is. Hope you enjoy...
The strange bug when the server goes to sleep in browser it sometimes seem to move the clip skip position regardless of where you have it set on the UI. I have not found any logical behavior for this but the outputs on these are at clip skip 3 while I have it set to 1, This I think this was responsible for an earlier error which I quickly just blamed on myself... It's correct on the listing below the output though.
loving how it clearly shows the progression!
btw, totally off-topic, you're usage of quotes in the prompt doesn't really do what you think it does - this is the parsed prompt:
[['ultra-high detail HDR 8k', 1.0], ['realistic masterpiece photorealistic photorealism', 1.5], ['daringly cinematic", "viscerally exaggerated", high quality professional photograph with "good depth of field" of a', 1.0], ['realistic "woodland rocky brook with a', 1.1], ['small waterfall"', 1.21], ['and fog in background at', 1.0], ['sunset', 1.21], ['colorful sky"', 1.1]]
re: clip skip - knowing how its monkey-patched in the backend to have backward compatibility, its quite possible.
loving how it clearly shows the progression!
I've found sometimes the merges at between the sub-tenths and hundredths ranges have interesting stuff going on too.
btw, totally off-topic, you're usage of quotes in the prompt doesn't really do what you think it does - this is the parsed prompt:
I mostly use quotes for concept groupings so some of it is just for me keeping track of them as modular concepts when cutting and pasting from my notes, as well as moving them around in the prompt. So I generally just leave them in as it doesn't generate errors from using them as I am. Sometimes I do mess with them to see if I can find any difference when using it in complex phrasing. However the only guidance I've really seen for quotation syntax usage has been from a few lines talking about it in Prompt S/R using it similarly for grouping in a list form. i.e. no spaces between quotes and separating commas.
set SD_PROMPT_DEBUG=1
and you can see the parsed result.
Pardon the delay, I'm just getting back up to speed. I had a bad electrical storm that took out every switch. router, modem, and cable box in the house last weekend and took a while to get all of that replaced and fixed for the most part still have some odds and ends to do.
Which file am I setting the arg: set SD_PROMPT_DEBUG=1 into?
Just do it in the shell before starting webui.
@vladmandic Ok I did one large merge test today for 50 iterations it ran for 16m30s to do the hardware test. I logged the run using open hardware monitor at 30s intervals. The CVS file is the hardware log from that which should be adequate enough data for any lesser run.
I used the full unpruned version of allowing it to be pruned via meh so it could have a little extra work: dreamshaper_5BakedVae_fp32.safetensors 7.2gb HD-22-fp32-fixclip.safetensors 7.5gb Result merge file: 8.8gb (it returns everything that it prunes out to the model at the end as stated before.)
OpenHardwareMonitorLog-2023-07-09.csv
I'm pruning output model post prior to image grid generations to be compared with base models fp32 merged at 0.5, and mehs at iterations 10, 50. Just a reminder these should be all fairly similar and is just to look for differences among the center point merges.
The differences are really small that I can spot, pixel differences at the edges of things, depth of blacks, small little details with more iterations especially in the collar area of the woman's top. With the prompts used I still kind of think the best run was on the two fp16 with that there was very good detail pick up across the board.
I haven't had a lot of time lately to run these off or I would have done some more in between iterations between the 10 and 50. Time allowing I'll get around to doing iterations of 100 and 200 as a follow up later as we know that merge method works already. So I think I should probably move on and I try the other merge methods hopefully later in the week. I'm still really not sure what will be good comparison to use against the block_weight presets beyond just checking to see if they work.
i think we should just check if there are any visual differences when using presets to see value of including them or not. doesn't matter if we like them or not, question is are differences even noticeable. other than that, i think we can wrap-up testing and start on the actual pr.
Last few things on my list before testing the presets...
Did a little side test using pix2pix and caught an error, I reported it over on Meh's repo. Just noting it here so you can be aware of it or if you have any insight for it. https://github.com/s1dlx/meh/issues/38
I need to check inpainting too as that could cause a similar error as it's also additional model architecture. Checked and filed error report: https://github.com/s1dlx/meh/issues/42
@DirtyHamster pix2pix and inpainting should be fixed in sd-meh ~0.9.0~ 0.9.4
git re-basin has been added to sdnext. there are some remaining issues that need to be addressed, but they are handled via separate issues.
Feature description
I spent a fair portion of the week reading about the various types of block merging methods for models that are currently available. One paper in particular entitled "GIT RE-BASIN: MERGING MODELS MODULO PERMUTATION SYMMETRIES" really caught my attention and I thought you might find it interesting as well. Paper is available here: https://arxiv.org/abs/2209.04836
Is there a reasonable method test to do their method vs what we already use to see if it would be an improvement? What method are you currently using so I can better understand what I'd be testing against? I had thought perhaps testing against their own method proof if their own original method proof is available. I figured I'd write this as I did incase someone else might be interested in testing it too.
My thought was this potentially could be added in as a option under interpolation as an "auto" option. As the weights are auto guided, I thought this might follow along with your idea of Ease-of-Use. Some of the more manual methods have users setting each of the independent in and out weight values model blocks which would also be nice to be able to do without leaving the core UI.
(conversation went on from there)
I did a little extra searching around from that: Just a simple GUI for the code: https://github.com/diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui
GUI used in some of the testing: https://github.com/axsddlr/sd-gui-meh
Code explored: https://github.com/samuela/git-re-basin https://github.com/themrzmaster/git-re-basin-pytorch They have some code in the pull section for dealing with safe tensors partially as well: https://github.com/diStyApps/Merge-Stable-Diffusion-models-without-distortion-gui/pull/1
Code used in testing: https://github.com/s1dlx/meh
Results were brought up from comments below after testing method was agreed on:
Model 1: used is: https://huggingface.co/Deltaadams/HD-22 fp32 Model 2: used is: dreamshaper_5BakedVae.safetensors via: https://huggingface.co/Lykon/DreamShaper Both models pruned from full trainable ema models to fp32 no ema and fp16 no ema prior to testing.
Testing method sampler and size settings: Settings: DPM++ 2M Karras @ 20 steps and a CFG scale of 7, Seed: 1897848000, Size: 512x716, CLIP: 4 Prompts Used: a headshot photographic portrait of a woman, a cat as a DJ at the turntables
Testing regiment: (Multiplier to be run from 0.1 to 0.9)
base-fp16+custom#1-fp16, base-fp16+custom#2-fp16
base-fp32+custom#1-fp32, base-fp32+custom#2-fp32
base-fp32+custom#1-fp16, base-fp32+custom#2-fp16
Re-git-basin side will be similarly mirrored: (Weight value set at .5:.5, iteration value to be run from 1 to 10)
Test1: base-fp16+custom#1-fp16, base-fp16+custom#2-fp16 @ weight: .5:.5, iteration {number set...}
Test2: base-fp32+custom#1-fp32, base-fp32+custom#2-fp32 @ weight: .5:.5, iteration {number set...}
Test3: base-fp32+custom#1-fp16, base-fp32+custom#2-fp16 @ weight: .5:.5, iteration {number set...}
Version Platform Description
Latest published version: e04867997e8903b9f44b75d073ef0be8c3159c12 2023-05-25T21:13:56Z