hako-mikan / sd-webui-supermerger

model merge extention for stable diffusion web ui
GNU Affero General Public License v3.0
725 stars 106 forks source link

New method suggestions for additional merge potential (code+output comparisons included) #33

Closed SwiftIllusion closed 10 months ago

SwiftIllusion commented 1 year ago

While trying to find methods to improve models, one of the things to look into was merging and hopefully the below discoveries are valuable in helping improve/provide additional options for merging.

Sum merging

Initially for this I started with inspiration after finding https://github.com/recoilme/losslessmix. However through ChatGPT (regardless of your feelings about utilizing this/concerns with accuracy the below outputs hopefully show the interactions held value), I found that to just be working with the vector orientations. I expanded that to also take into account the magnitude and combined the results for the best merging outputs in my comparisons. One of the difficulties with sum merging is you can find you lose some things through the merge, below is a comparison with different prompts and 2 seeds between regular merge and the new method. xyz_grid-0019-1234-detailed jewels-encrusted, display-case You can see below the improved details/depth especially in the jewelry, and in the top girls background, the top birds twig also connects better besides the extra details, and improved hands for the guy on the right. xyz_grid-0018-1234-detailed jewels-encrusted, display-case

Add difference merging

One of the things that has been most difficult with add merging, is the rate in which consecutive merging in attempts to gain more learning can lead to burnt/overexposed-like colors and edges. To demonstrate this and what the new method achieves, these are comparisons starting with seekArtMega20, then adding dreamlike and openjourneyV2 (with sdv1.5 as model C for the difference).

The code

This may be a bit messy for implementation, I just replaced the existing methods for merging/adding (I don't have the ability or experience to make this into additional options/a pull request), but here is what I used.

Relevant changes

from inspect import currentframe

mergedmodel=[] typesg = ["none","alpha","beta (if Triple or Twice is not selected,Twice automatically enable)","alpha and beta","seed", "mbw alpha","mbw beta","mbw alpha and beta", "model_A","model_B","model_C","pinpoint blocks (alpha or beta must be selected for another axis)","elemental","pinpoint element","effective elemental checker"] types = ["none","alpha","beta","alpha and beta","seed", "mbw alpha ","mbw beta","mbw alpha and beta", "model_A","model_B","model_C","pinpoint blocks","elemental","pinpoint element","effective"] modes=["Weight" ,"Add" ,"Triple","Twice"] sevemodes=["save model", "overwrite"]

type[0:aplha,1:beta,2:seed,3:mbw,4:model_A,5:model_B,6:model_C]

msettings=[0 weights_a,1 weights_b,2 model_a,3 model_b,4 model_c,5 base_alpha,6 base_beta,7 mode,8 useblocks,9 custom_name,10 save_sets,11 id_sets,12 wpresets]

id sets "image", "PNG info","XY grid"

hear = False hearm = False non3 = [None]*3

def caster(news,hear): if hear: print(news)

def casterr(*args,hear=hear): if hear: names = {id(v): k for k, v in currentframe().f_back.f_locals.items()} print('\n'.join([names.get(id(arg), '???') + ' = ' + repr(arg) for arg in args]))

msettings=[weights_a,weights_b,model_a,model_b,model_c,device,base_alpha,base_beta,mode,loranames,useblocks,custom_name,save_sets,id_sets,wpresets,deep]

def smergegen(weights_a,weights_b,model_a,model_b,model_c,base_alpha,base_beta,mode,useblocks,custom_name,save_sets,id_sets,wpresets,deep,esettings, prompt,nprompt,steps,sampler,cfg,seed,w,h,currentmodel,imggen):

deepprint  = True if "print change" in esettings else False

result,currentmodel,modelid,theta_0 = smerge(weights_a,weights_b,model_a,model_b,model_c,base_alpha,base_beta,mode,useblocks,custom_name,save_sets,id_sets,wpresets,deep,deepprint=deepprint)

if "ERROR" in result: return result, *non3

usemodelgen(theta_0,model_a)

save = True if sevemodes[0] in save_sets else False

result = savemodel(theta_0,currentmodel,custom_name,save_sets,model_a) if save else "Merged model loaded:"+currentmodel

gc.collect()

if imggen :
    images = simggen(prompt,nprompt,steps,sampler,cfg,seed,w,h,currentmodel,id_sets,modelid)
    return result,currentmodel,*images[:4]
else:
    return result,currentmodel

NUM_INPUT_BLOCKS = 12 NUM_MID_BLOCK = 1 NUM_OUTPUT_BLOCKS = 12 NUM_TOTAL_BLOCKS = NUM_INPUT_BLOCKS + NUM_MID_BLOCK + NUM_OUTPUT_BLOCKS blockid=["BASE","IN00","IN01","IN02","IN03","IN04","IN05","IN06","IN07","IN08","IN09","IN10","IN11","M00","OUT00","OUT01","OUT02","OUT03","OUT04","OUT05","OUT06","OUT07","OUT08","OUT09","OUT10","OUT11"]

def smerge(weights_a,weights_b,model_a,model_b,model_c,base_alpha,base_beta,mode,useblocks,custom_name,save_sets,id_sets,wpresets,deep,deepprint = False): caster("merge start",hearm) global hear global mergedmodel

gc.collect()

# for from file
if type(useblocks) is str:
    useblocks = True if useblocks =="True" else False
if type(base_alpha) == str:base_alpha = float(base_alpha)
if type(base_beta) == str:base_beta  = float(base_beta)

# preset to weights
if wpresets != False and useblocks:
    weights_a = wpreseter(weights_a,wpresets)
    weights_b = wpreseter(weights_b,wpresets)

# mode select booleans
save = True if sevemodes[0] in save_sets else False
usebeta = modes[2] in mode or modes[3] in mode

if not useblocks:
    weights_a = weights_b = ""
#for save log and save current model
mergedmodel =[weights_a,weights_b,
                        hashfromname(model_a),hashfromname(model_b),hashfromname(model_c),
                        base_alpha,base_beta,mode,useblocks,custom_name,save_sets,id_sets,deep].copy()

model_a = namefromhash(model_a)
model_b = namefromhash(model_b)
model_c = namefromhash(model_c)

caster(mergedmodel,False)

if len(deep) > 0:
    deep = deep.replace("\n",",")
    deep = deep.split(",")

#format check
if model_a =="" or model_b =="" or ((not modes[0] in mode) and model_c=="") : 
    return "ERROR: Necessary model is not selected",*non3

#for MBW text to list
if useblocks:
    weights_a_t=weights_a.split(',',1)
    weights_b_t=weights_b.split(',',1)
    base_alpha  = float(weights_a_t[0])    
    weights_a = [float(w) for w in weights_a_t[1].split(',')]
    caster(f"from {weights_a_t}, alpha = {base_alpha},weights_a ={weights_a}",hearm)
    if len(weights_a) != 25:return f"ERROR: weights alpha value must be {26}.",*non3
    if usebeta:
        base_beta = float(weights_b_t[0]) 
        weights_b = [float(w) for w in weights_b_t[1].split(',')]
        caster(f"from {weights_b_t}, beta = {base_beta},weights_a ={weights_b}",hearm)
        if len(weights_b) != 25: return f"ERROR: weights beta value must be {26}.",*non3

caster("model load start",hearm)

print(f"  model A  \t: {model_a}")
print(f"  model B  \t: {model_b}")
print(f"  model C  \t: {model_c}")
print(f"  alpha,beta\t: {base_alpha,base_beta}")
print(f"  weights_alpha\t: {weights_a}")
print(f"  weights_beta\t: {weights_b}")
print(f"  mode\t\t: {mode}")
print(f"  MBW \t\t: {useblocks}")

theta_1=load_model_weights_m(model_b,False,True,save).copy()

if modes[1] in mode:#Add
    theta_2 = load_model_weights_m(model_c,False,False,save).copy()
    for key in tqdm(theta_1.keys()):
        if 'model' in key:
            if key in theta_2:
                t2 = theta_2.get(key, torch.zeros_like(theta_1[key]))
                theta_1[key] = theta_1[key]- t2
            else:
                theta_1[key] = torch.zeros_like(theta_1[key])
    del theta_2

theta_0=load_model_weights_m(model_a,True,False,save).copy()

if modes[2] in mode or modes[3] in mode:#Tripe or Twice
    theta_2 = load_model_weights_m(model_c,False,False,save).copy()

alpha = base_alpha
beta = base_beta

re_inp = re.compile(r'\.input_blocks\.(\d+)\.')  # 12
re_mid = re.compile(r'\.middle_block\.(\d+)\.')  # 1
re_out = re.compile(r'\.output_blocks\.(\d+)\.') # 12

chckpoint_dict_skip_on_merge = ["cond_stage_model.transformer.text_model.embeddings.position_ids"]
count_target_of_basealpha = 0

sim = torch.nn.CosineSimilarity(dim=0)
sims = np.array([], dtype=np.float64)
for key in (tqdm(theta_0.keys(), desc="Stage 0/2")):
    # skip VAE model parameters to get better results
    if "first_stage_model" in key: continue
    if "model" in key and key in theta_1:
        simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
        dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
        magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
        combined_similarity = (simab + magnitude_similarity) / 2.0
        sims = np.append(sims, combined_similarity.numpy())
sims = sims[~np.isnan(sims)]
sims = np.delete(sims, np.where(sims < np.percentile(sims, 1, method='midpoint')))
sims = np.delete(sims, np.where(sims > np.percentile(sims, 99, method='midpoint')))

for key in (tqdm(theta_0.keys(), desc="Stage 1/2") if not False else theta_0.keys()):
    if "model" in key and key in theta_1:
        if usebeta and not key in theta_2:
            continue

        weight_index = -1
        current_alpha = alpha
        current_beta = beta

        if key in chckpoint_dict_skip_on_merge:
            continue

        # check weighted and U-Net or not
        if weights_a is not None and 'model.diffusion_model.' in key:
            # check block index
            weight_index = -1

            if 'time_embed' in key:
                weight_index = 0                # before input blocks
            elif '.out.' in key:
                weight_index = NUM_TOTAL_BLOCKS - 1     # after output blocks
            else:
                m = re_inp.search(key)
                if m:
                    inp_idx = int(m.groups()[0])
                    weight_index = inp_idx
                else:
                    m = re_mid.search(key)
                    if m:
                        weight_index = NUM_INPUT_BLOCKS
                    else:
                        m = re_out.search(key)
                        if m:
                            out_idx = int(m.groups()[0])
                            weight_index = NUM_INPUT_BLOCKS + NUM_MID_BLOCK + out_idx

            if weight_index >= NUM_TOTAL_BLOCKS:
                print(f"ERROR: illegal block index: {key}")
                return f"ERROR: illegal block index: {key}",None,None

            if weight_index >= 0 and useblocks:
                current_alpha = weights_a[weight_index]
                if usebeta: current_beta = weights_b[weight_index]
        else:
            count_target_of_basealpha = count_target_of_basealpha + 1

        if len(deep) > 0:
            skey = key + blockid[weight_index+1]
            for d in deep:
                if d.count(":") != 2 :continue
                dbs,dws,dr = d.split(":")[0],d.split(":")[1],d.split(":")[2]
                dbs,dws = dbs.split(" "), dws.split(" ")
                dbn,dbs = (True,dbs[1:]) if dbs[0] == "NOT" else (False,dbs)
                dwn,dws = (True,dws[1:]) if dws[0] == "NOT" else (False,dws)
                flag = dbn
                for db in dbs:
                    if db in skey:
                        flag = not dbn
                if flag:flag = dwn
                else:continue
                for dw in dws:
                    if dw in skey:
                        flag = not dwn
                if flag:
                    dr = float(dr)
                    if deepprint :print(dbs,dws,key,dr)
                    current_alpha = dr

        if modes[1] in mode:#Add
            caster(f"model A[{key}] +  {current_alpha} + * (model B - model C)[{key}]", hear)

            # Apply median filter to the weight differences
            filtered_diff = scipy.ndimage.median_filter(theta_1[key].to(torch.float32).cpu().numpy(), size=3)

            # Apply Gaussian filter to the filtered differences
            filtered_diff = scipy.ndimage.gaussian_filter(filtered_diff, sigma=1)

            theta_1[key] = torch.tensor(filtered_diff)

            # Add the filtered differences to the original weights
            theta_0[key] = theta_0[key] + current_alpha * theta_1[key]
        elif modes[2] in mode:#Triple
            caster(f"model A[{key}] +  {1-current_alpha-current_beta} +  model B[{key}]*{current_alpha} + model C[{key}]*{current_beta}",hear)
            theta_0[key] = (1 - current_alpha-current_beta) * theta_0[key] + current_alpha * theta_1[key]+current_beta * theta_2[key]
        elif modes[3] in mode:#Twice
            caster(f"model A[{key}] +  {1-current_alpha} + * model B[{key}]*{alpha}",hear)
            caster(f"model A+B[{key}] +  {1-current_beta} + * model C[{key}]*{beta}",hear)
            theta_0[key] = (1 - current_alpha) * theta_0[key] + current_alpha * theta_1[key]
            theta_0[key] = (1 - current_beta) * theta_0[key] + current_beta * theta_2[key]
        else:#Weight
            if current_alpha == 1:
                caster(f"alpha = 0,model A[{key}=model B[{key}",hear)
                theta_0[key] = theta_1[key]
            elif current_alpha !=0:
                caster(f"model A[{key}] +  {1-current_alpha} + * (model B)[{key}]*{alpha}",hear)

                # skip VAE model parameters to get better results
                if "first_stage_model" in key: continue
                if "model" in key and key in theta_0:
                    simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                    dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                    magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                    combined_similarity = (simab + magnitude_similarity) / 2.0
                    k = (combined_similarity - sims.min()) / (sims.max() - sims.min())
                    k = k - current_alpha
                    k = k.clip(min=.0,max=1.)
                    theta_0[key] = theta_0[key] * (1 - k) + theta_1[key] * k

currentmodel = makemodelname(weights_a,weights_b,model_a, model_b,model_c, base_alpha,base_beta,useblocks,mode)

for key in tqdm(theta_1.keys(), desc="Stage 2/2"):
    if key in chckpoint_dict_skip_on_merge:
        continue
    if "model" in key and key not in theta_0:
        theta_0.update({key:theta_1[key]})

modelid = rwmergelog(currentmodel,mergedmodel)

caster(mergedmodel,False)

return "",currentmodel,modelid,theta_0

def load_model_weights_m(model,model_a,model_b,save): checkpoint_info = sd_models.get_closet_checkpoint_match(model) sd_model_name = checkpoint_info.model_name

cachenum = shared.opts.sd_checkpoint_cache

if save:        
    if model_a:
        load_model(checkpoint_info)
    print(f"Loading weights [{sd_model_name}] from file")
    return sd_models.read_state_dict(checkpoint_info.filename,"cuda")

if checkpoint_info in checkpoints_loaded:
    print(f"Loading weights [{sd_model_name}] from cache")
    return checkpoints_loaded[checkpoint_info]
elif cachenum>0 and model_a:
    load_model(checkpoint_info)
    print(f"Loading weights [{sd_model_name}] from cache")
    return checkpoints_loaded[checkpoint_info]
elif cachenum>1 and model_b:
    load_model(checkpoint_info)
    print(f"Loading weights [{sd_model_name}] from cache")
    return checkpoints_loaded[checkpoint_info]
elif cachenum>2:
    load_model(checkpoint_info)
    print(f"Loading weights [{sd_model_name}] from cache")
    return checkpoints_loaded[checkpoint_info]
else:
    if model_a:
        load_model(checkpoint_info)
    print(f"Loading weights [{sd_model_name}] from file")
    return sd_models.read_state_dict(checkpoint_info.filename,"cuda")

def makemodelname(weights_a,weights_b,model_a, model_b,model_c, alpha,beta,useblocks,mode): model_a=filenamecutter(model_a) model_b=filenamecutter(model_b) model_c=filenamecutter(model_c)

modes=["Weight" ,"Add" ,"Triple","Twice","Diff"]

if type(alpha) == str:alpha = float(alpha)
if type(beta)== str:beta  = float(beta)

if useblocks:
    if modes[1] in mode:#add
        currentmodel =f"{model_a} + ({model_b} - {model_c}) x alpha ({str(round(alpha,3))},{','.join(str(s) for s in weights_a)}"
    elif modes[2] in mode:#triple
        currentmodel =f"{model_a} x (1-alpha-beta) + {model_b} x alpha + {model_c} x beta (alpha = {str(round(alpha,3))},{','.join(str(s) for s in weights_a)},beta = {beta},{','.join(str(s) for s in weights_b)})"
    elif modes[3] in mode:#twice
        currentmodel =f"({model_a} x (1-alpha) + {model_b} x alpha)x(1-beta)+  {model_c} x beta ({str(round(alpha,3))},{','.join(str(s) for s in weights_a)})_({str(round(beta,3))},{','.join(str(s) for s in weights_b)})"
    else:
        currentmodel =f"{model_a} x (1-alpha) + {model_b} x alpha ({str(round(alpha,3))},{','.join(str(s) for s in weights_a)})"
else:
    if modes[1] in mode:#add
        currentmodel =f"{model_a} + ({model_b} -  {model_c}) x {str(round(alpha,3))}"
    elif modes[2] in mode:#triple
        currentmodel =f"{model_a} x {str(round(1-alpha-beta,3))} + {model_b} x {str(round(alpha,3))} + {model_c} x {str(round(beta,3))}"
    elif modes[3] in mode:#twice
        currentmodel =f"({model_a} x {str(round(1-alpha,3))} +{model_b} x {str(round(alpha,3))}) x {str(round(1-beta,3))} + {model_c} x {str(round(beta,3))}"
    else:
        currentmodel =f"{model_a} x {str(round(1-alpha,3))} + {model_b} x {str(round(alpha,3))}"
return currentmodel

path_root = scripts.basedir()

def rwmergelog(mergedname = "",settings= [],id = 0): setting = settings.copy() filepath = os.path.join(path_root, "mergehistory.csv") is_file = os.path.isfile(filepath) if not is_file: with open(filepath, 'a') as f:

msettings=[0 weights_a,1 weights_b,2 model_a,3 model_b,4 model_c,5 base_alpha,6 base_beta,7 mode,8 useblocks,9 custom_name,10 save_sets,11 id_sets,12 wpresets]

        f.writelines('"ID","time","name","weights alpha","weights beta","model A","model B","model C","alpha","beta","mode","use MBW","plus lora","custum name","save setting","use ID"\n')
with  open(filepath, 'r+') as f:
    reader = csv.reader(f)
    mlist = [raw for raw in reader]
    if mergedname != "":
        mergeid = len(mlist)
        setting.insert(0,mergedname)
        for i,x in enumerate(setting):
            if "," in str(x):setting[i] = f'"{str(setting[i])}"'
        text = ",".join(map(str, setting))
        text=str(mergeid)+","+datetime.datetime.now().strftime('%Y.%m.%d %H.%M.%S.%f')[:-7]+"," + text + "\n"
        f.writelines(text)
        return mergeid
    try:
        out = mlist[int(id)]
    except:
        out = "ERROR: OUT of ID index"
    return out

def draw_origin(grid, text,width,height,width_one): grid_d= Image.new("RGB", (grid.width,grid.height), "white") grid_d.paste(grid,(0,0)) def get_font(fontsize): try: return ImageFont.truetype(opts.font or Roboto, fontsize) except Exception: return ImageFont.truetype(Roboto, fontsize) d= ImageDraw.Draw(grid_d) color_active = (0, 0, 0) fontsize = (width+height)//25 fnt = get_font(fontsize)

if grid.width != width_one:
    while d.multiline_textsize(text, font=fnt)[0] > width_one*0.75 and fontsize > 0:
        fontsize -=1
        fnt = get_font(fontsize)
d.multiline_text((0,0), text, font=fnt, fill=color_active,align="center")
return grid_d

def wpreseter(w,presets): if "," not in w and w != "": presets=presets.splitlines() wdict={} for l in presets: if ":" in l : key = l.split(":",1)[0] wdict[key.strip()]=l.split(":",1)[1] if "\t" in l: key = l.split("\t",1)[0] wdict[key.strip()]=l.split("\t",1)[1] if w.strip() in wdict: name = w w = wdict[w.strip()] print(f"weights {name} imported from presets : {w}") return w

def fullpathfromname(name): if hash == "" or hash ==[]: return "" checkpoint_info = sd_models.get_closet_checkpoint_match(name) return checkpoint_info.filename

def namefromhash(hash): if hash == "" or hash ==[]: return "" checkpoint_info = sd_models.get_closet_checkpoint_match(hash) return checkpoint_info.model_name

def hashfromname(name): from modules import sd_models if name == "" or name ==[]: return "" checkpoint_info = sd_models.get_closet_checkpoint_match(name) if checkpoint_info.shorthash is not None: return checkpoint_info.shorthash return checkpoint_info.calculate_shorthash()

def simggen(prompt, nprompt, steps, sampler, cfg, seed, w, h,mergeinfo="",id_sets=[],modelid = "no id"): shared.state.begin() p = processing.StableDiffusionProcessingTxt2Img( sd_model=shared.sd_model, do_not_save_grid=True, do_not_save_samples=True, do_not_reload_embeddings=True, ) p.batch_size = 1 p.prompt = prompt p.negative_prompt = nprompt p.steps = steps p.sampler_name = sd_samplers.samplers[sampler].name p.cfg_scale = cfg p.seed = seed p.width = w p.height = h p.seed_resize_from_w=0 p.seed_resize_from_h=0 p.denoising_strength=None

if type(p.prompt) == list:
    p.all_prompts = [shared.prompt_styles.apply_styles_to_prompt(x, p.styles) for x in p.prompt]
else:
    p.all_prompts = [shared.prompt_styles.apply_styles_to_prompt(p.prompt, p.styles)]

if type(p.negative_prompt) == list:
    p.all_negative_prompts = [shared.prompt_styles.apply_negative_styles_to_prompt(x, p.styles) for x in p.negative_prompt]
else:
    p.all_negative_prompts = [shared.prompt_styles.apply_negative_styles_to_prompt(p.negative_prompt, p.styles)]

processed:Processed = processing.process_images(p)
if "image" in id_sets: processed.images[0] =  draw_origin(processed.images[0], str(modelid),w,h,w)
image = processed.images[0]
if "PNG info" in id_sets:mergeinfo = mergeinfo + " ID " + str(modelid)

infotext = create_infotext(p, p.all_prompts, p.all_seeds, p.all_subseeds)
if infotext.count("Steps: ")>1:
    infotext = infotext[:infotext.rindex("Steps")]

infotexts = infotext.split(",")
for i,x in enumerate(infotexts):
    if "Model:"in x:
        infotexts[i] = " Model: "+mergeinfo.replace(","," ")
infotext= ",".join(infotexts)
images.save_image(image, opts.outdir_txt2img_samples, "",p.seed, p.prompt,shared.opts.samples_format, p=p,info=infotext)
shared.state.end()
return processed.images,infotext,plaintext_to_html(processed.info), plaintext_to_html(processed.comments),p


### Minor suggestion
Not necessary but would be appreciated if "save settings" had a "save merge" button so you didn't need to toggle "save model" and then re-merge (more relevant with the longer time the above methods take to merge).
hako-mikan commented 1 year ago

Thanks for the suggestion, I've been interested in lossless merging, but the complexity of the code has kept me from getting into it. I will think about what you suggested.

As for the Save button, it used to exist in the past, but has been removed. This is because merging is done again even if the Save button is pressed. The loaded model is in fp16 format and needs to be merged back.

SwiftIllusion commented 1 year ago

No worries, I'm especially glad then I was able to share here how I ended up implementing it and how another method could be added too :) . Good luck whenever you might get to it.

Ahh if that's the case it makes sense, with a button you would expect it to be able to save immediately but if there's that limitation and it needs to merge again anyway that button would be confusing, thanks for clarifying.

hako-mikan commented 1 year ago

Added features. Thanks!

mariaWitch commented 1 year ago

You can actually get a lot of the performance back when using the filters by offloading it to the gpu by using CuPy. It shouldn't be too difficult to implement.

SwiftIllusion commented 1 year ago

Very smooth implementation, thank you for the great work :)

An error however I found in the latest update, when trying to save a file specifically as a safetesnor (with both normal calculation and cosine):

Traceback (most recent call last):
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\routes.py", line 337, in run_predict
    output = await app.get_blocks().process_api(
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 1015, in process_api
    result = await self.call_function(
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\gradio\blocks.py", line 833, in call_function
    prediction = await anyio.to_thread.run_sync(
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\to_thread.py", line 31, in run_sync
    return await get_asynclib().run_sync_in_worker_thread(
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 937, in run_sync_in_worker_thread
    return await future
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 867, in run
    result = context.run(func, *args)
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\extensions\sd-webui-supermerger\scripts\mergers\mergers.py", line 72, in smergegen
    result = savemodel(theta_0,currentmodel,custom_name,save_sets,model_a,metadata) if save else "Merged model loaded:"+currentmodel
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\extensions\sd-webui-supermerger\scripts\mergers\model_util.py", line 700, in savemodel
    safetensors.torch.save_file(state_dict, fname, metadata=metadata)
  File "D:\AIdev\AIdiffusion\diffusion\stable-diffusion-webui\venv\lib\site-packages\safetensors\torch.py", line 71, in save_file
    serialize_file(_flatten(tensors), filename, metadata=metadata)
TypeError: argument 'metadata': 'dict' object cannot be converted to 'PyString'

Also with how smooth this latest implementation, I was able to add another version of the Cosine merge, which mixes the weights separately before calculating cosine, and you can see in the demonstrated comparison below how this can result in favoring structures of A and details of B/and the other way round from modelA vs modelB (calculated in a different sequence, it's not a result you would get by just swapping the A/B model around), so I added and adjusted them as cosineA and cosineB calculation modes.

I've also added in as a calculation mode smoothAdd which is the smoother filtered add difference method from the original post that was missed here. And thanks to the implementation of the first Cosine I was able to implement them 'properly' this time instead of just replacing existing things.

In supermerger.py I replace

                    calcmode = gr.Radio(label = "Calcutation Mode",choices = ["normal", "cosine"], value = "normal")

with

                    calcmode = gr.Radio(label = "Calcutation Mode",choices = ["normal", "cosineA", "cosineB", "smoothAdd"], value = "normal")

Then in mergers.py I replace

            elif calcmode == "cosine":
                # skip VAE model parameters to get better results
                if "first_stage_model" in key: continue
                if "model" in key and key in theta_0:
                    simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                    dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                    magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                    combined_similarity = (simab + magnitude_similarity) / 2.0
                    k = (combined_similarity - sims.min()) / (sims.max() - sims.min())
                    k = k - current_alpha
                    k = k.clip(min=.0,max=1.)
                    caster(f"model A[{key}] +  {1-k} + * (model B)[{key}]*{k}",hear)
                    theta_0[key] = theta_0[key] * (1 - k) + theta_1[key] * k

with

            elif calcmode == "cosineA": #favors modelA's structure with details from B
                # skip VAE model parameters to get better results
                if "first_stage_model" in key: continue
                if "model" in key and key in theta_0:
                    # Normalize the vectors before merging
                    theta_0_norm = nn.functional.normalize(theta_0[key].to(torch.float32), p=2, dim=0)
                    theta_1_norm = nn.functional.normalize(theta_1[key].to(torch.float32), p=2, dim=0)
                    simab = sim(theta_0_norm, theta_1_norm)
                    dot_product = torch.dot(theta_0_norm.view(-1), theta_1_norm.view(-1))
                    magnitude_similarity = dot_product / (torch.norm(theta_0_norm) * torch.norm(theta_1_norm))
                    combined_similarity = (simab + magnitude_similarity) / 2.0
                    k = (combined_similarity - sims.min()) / (sims.max() - sims.min())
                    k = k - current_alpha
                    k = k.clip(min=.0,max=1.)
                    caster(f"model A[{key}] +  {1-k} + * (model B)[{key}]*{k}",hear)
                    theta_0[key] = theta_1[key] * (1 - k) + theta_0[key] * k

            elif calcmode == "cosineB": #favors modelB's structure with details from A
                # skip VAE model parameters to get better results
                if "first_stage_model" in key: continue
                if "model" in key and key in theta_0:
                    simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                    dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                    magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                    combined_similarity = (simab + magnitude_similarity) / 2.0
                    k = (combined_similarity - sims.min()) / (sims.max() - sims.min())
                    k = k - current_alpha
                    k = k.clip(min=.0,max=1.)
                    caster(f"model A[{key}] +  {1-k} + * (model B)[{key}]*{k}",hear)
                    theta_0[key] = theta_1[key] * (1 - k) + theta_0[key] * k

            elif calcmode == "smoothAdd":
                caster(f"model A[{key}] +  {current_alpha} + * (model B - model C)[{key}]", hear)
                # Apply median filter to the weight differences
                filtered_diff = scipy.ndimage.median_filter(theta_1[key].to(torch.float32).cpu().numpy(), size=3)
                # Apply Gaussian filter to the filtered differences
                filtered_diff = scipy.ndimage.gaussian_filter(filtered_diff, sigma=1)
                theta_1[key] = torch.tensor(filtered_diff)
                # Add the filtered differences to the original weights
                theta_0[key] = theta_0[key] + current_alpha * theta_1[key]

and

    if calcmode =="cosine":
        if stopmerge: return "STOPPED", *non4
        sim = torch.nn.CosineSimilarity(dim=0)
        sims = np.array([], dtype=np.float64)
        for key in (tqdm(theta_0.keys(), desc="Stage 0/2")):
            # skip VAE model parameters to get better results
            if "first_stage_model" in key: continue
            if "model" in key and key in theta_1:
                simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                combined_similarity = (simab + magnitude_similarity) / 2.0
                sims = np.append(sims, combined_similarity.numpy())
        sims = sims[~np.isnan(sims)]
        sims = np.delete(sims, np.where(sims < np.percentile(sims, 1, method='midpoint')))
        sims = np.delete(sims, np.where(sims > np.percentile(sims, 99, method='midpoint')))

with

    if calcmode =="cosineA": #favors modelA's structure with details from B
        if stopmerge: return "STOPPED", *non4
        sim = torch.nn.CosineSimilarity(dim=0)
        sims = np.array([], dtype=np.float64)
        for key in (tqdm(theta_0.keys(), desc="Stage 0/2")):
            # skip VAE model parameters to get better results
            if "first_stage_model" in key: continue
            if "model" in key and key in theta_1:
                theta_0_norm = nn.functional.normalize(theta_0[key].to(torch.float32), p=2, dim=0)
                theta_1_norm = nn.functional.normalize(theta_1[key].to(torch.float32), p=2, dim=0)
                simab = sim(theta_0_norm, theta_1_norm)
                sims = np.append(sims,simab.numpy())
        sims = sims[~np.isnan(sims)]
        sims = np.delete(sims, np.where(sims<np.percentile(sims, 1 ,method = 'midpoint')))
        sims = np.delete(sims, np.where(sims>np.percentile(sims, 99 ,method = 'midpoint')))

    if calcmode =="cosineB": #favors modelB's structure with details from A
        if stopmerge: return "STOPPED", *non4
        sim = torch.nn.CosineSimilarity(dim=0)
        sims = np.array([], dtype=np.float64)
        for key in (tqdm(theta_0.keys(), desc="Stage 0/2")):
            # skip VAE model parameters to get better results
            if "first_stage_model" in key: continue
            if "model" in key and key in theta_1:
                simab = sim(theta_0[key].to(torch.float32), theta_1[key].to(torch.float32))
                dot_product = torch.dot(theta_0[key].view(-1).to(torch.float32), theta_1[key].view(-1).to(torch.float32))
                magnitude_similarity = dot_product / (torch.norm(theta_0[key].to(torch.float32)) * torch.norm(theta_1[key].to(torch.float32)))
                combined_similarity = (simab + magnitude_similarity) / 2.0
                sims = np.append(sims, combined_similarity.numpy())
        sims = sims[~np.isnan(sims)]
        sims = np.delete(sims, np.where(sims < np.percentile(sims, 1, method='midpoint')))
        sims = np.delete(sims, np.where(sims > np.percentile(sims, 99, method='midpoint')))

and I added these includes necessary

import torch.nn as nn
import scipy.ndimage
from scipy.ndimage.filters import median_filter as filter

@mariaWitch Thanks for the suggestion, however though I believe I got the code for that method, it looks to be restricted to CUDA devices and while trying to pip install it, it wouldn't work (couldn't find CUDA for some reason) and was trying to built it on my system itself which it couldn't, and so not sure myself how that would be properly implemented? Especially when just the necessary include would cause these problems.

mariaWitch commented 1 year ago

Thanks for the suggestion, however though I believe I got the code for that method, it looks to be restricted to CUDA devices and while trying to pip install it, it wouldn't work (couldn't find CUDA for some reason) and was trying to built it on my system itself which it couldn't, and so not sure myself how that would be properly implemented? Especially when just the necessary include would cause these problems.

I actually properly implemented it into Bayesian Merger, which has a very similar code structure to Super Merger. You can check it out here:

github.com/mariaWitch/sd-webui-bayesian-merger/blob/double-diff-cosine/sd_webui_bayesian_merger/merger.py#L250-L263

But essentially I had to convert the tensor that gets passed to SciPy (In this case CuPyx) to a dl pack and then use CuPy to convert it into a CuPy Array, and then pass that to the filters. Once that was done, I was able to convert it back into a DLpack, and then convert it back into a tensor with from_dlpack. The reason why we have to convert the tensor to a DL pack is that this is the only supported way doing Zero-copy transfer of a cpu tensor into a CuPy array (which is on the gpu, as CuPyX does not support standard numpy arrays). By doing it this way, we avoid costly memory transfers between system ram and VRAM that would otherwise decrease performance. It should be noted that CuPy and CuPyX (part of the same package) both have nearly identitcal functions to their Non Cuda Counterparts, so much so that I literally just recasted cupyx.scipy as scipy when I imported.

from torch.utils.dlpack import to_dlpack
from torch.utils.dlpack import from_dlpack
import cupy as cp
import cupyx.scipy as scipy
from cupyx.scipy.ndimage._filters import median_filter as filter

These would be the imports that you would bring in if CuPy was installed, (these would be imported in place of scipy)

mariaWitch commented 1 year ago

As for your installation issues, I have had no such bad luck. But, I think CuPy has experiemental support for ROCM as well. Either way it can exist as something that can work if the script can import it, otherwise it could just fail over. But I saw a 10x speed reduction just from using CuPy instead of SciPy for the filters, so I think it is definitely worth trying to get working.

mariaWitch commented 1 year ago

Also, could you elaborate a little on what you mean by the "structure" of the model and the "details" of a model in the context that you used them in, it seems a bit abstract, and could mean a lot of different things.

hako-mikan commented 1 year ago

@SwiftIllusion Thanks a lot! I I will be implementing the method you described in the next update. I need your help. I am also planning to implement a new calculation method and will create a new README about the calculation method. Could you please write an explanation about the calculation method you have introduced? https://github.com/hako-mikan/sd-webui-supermerger/blob/ver10/calcmode.md

@mariaWitch Thanks for your advice. Certainly the faster the calculation the better, so it would be good if we could implement the method you have introduced. On the other hand, methods that depend on the environment cause many problems. Especially users who use google colab seem to have a lot of import problems. Thus, I would consider it to work with or without installation.

SwiftIllusion commented 1 year ago

No worries :) glad to hear.

I don't have the technical wizardy or verbal expertise as some, but I've tried with my own observations in its development/output alongside chatgpt to help provide some more guidance/details below, as I know what it's like to see new tech and have no idea what it's doing/how to take advantage of it. Hope it helps.

normal

Available modes : All

Normal calculation method. Can be used in all modes.

cosineA/cosineB

Available modes : weight sum

The comparison of two models is performed using cosine similarity, centered on the set ratio, and is calculated to eliminate loss due to merging. See below for further details. https://github.com/hako-mikan/sd-webui-supermerger/issues/33 https://github.com/recoilme/losslessmix

The original simple weight mode is the most basic method and works by linearly interpolating between the two models based on a given weight alpha. At alpha = 0, the output is the first model (model A), and at alpha = 1, the output is the second model (model B). Any other value of alpha results in a weighted average of the two models.

One key advantage of the cosine methods over the original simple weight mode is that they take into account the structural similarity between the two models, which can lead to better results when the two models are similar but not identical. Another advantage of the cosine methods is that they can help prevent overfitting and improve generalization by limiting the amount of detail from one model that is incorporated into the other.

In the case of CosineA, we normalize the vectors of the first model (model A) before merging, so the resulting merged model will favor the structure of the first model while incorporating details from the second model. This is because we are essentially aligning the direction of the first model's vectors with the direction of the corresponding vectors in the second model.

Detail-wise for example note how above and below, in all cases there's more blur preserved for the background compared to foreground, instead of the linear difference in the original merge.

On the other hand, in CosineB, we normalize the vectors of the second model (model B) before merging, so the resulting merged model will favor the structure of the second model while incorporating details from the first model. This is because we are aligning the direction of the second model's vectors with the direction of the corresponding vectors in the first model.

In summary, the choice between CosineA and CosineB depends on which model's structure you want to prioritize in the resulting merged model. If you want to prioritize the structure of the first model, use CosineA. If you want to prioritize the structure of the second model, use CosineB.

Note also how the second model is more the 'reference point' for the merging looking at Alpha 1 compared to the changes at 0, so the order of models can also change the end result to look for your desired output.

smoothAdd

Available modes : Add difference

A method of add difference that mixes the benefits of Median and Gaussian filters, to add model differences in a smoother way trying to avoid the negative 'burning' effect that can be seen when adding too many models this way. This also achieves more than just simply adding the difference at a lower value.

SwiftIllusion commented 1 year ago

@mariaWitch Regrettably without being able to install the requirements for the include of your method, I've been unable to see that here. I also spent hours trying to implement other methods/performance improvements to the filters within the existing scope, but the closest I got was a different method for one of the two filters that resulted in a completely different/wrong output, the rest of the time was spent with errors, so I've had to consider that beyond the scope of what I can achieve. At least though with also seeing the additional prompt about the readme, I've now outlined everything better above to hopefully help you and others better understand what I was referring to in the cosine merge methods and how to take advantage of it all.

mariaWitch commented 1 year ago

So Structure refers to the background and pose, and details refer to the actual character details on the subject, that makes it a lot more clear.

hako-mikan commented 1 year ago

@SwiftIllusion Great! Thanks a lot! Your explanation with the poses is very clear.

hako-mikan commented 1 year ago

Updated

recoilme commented 1 year ago

I just checked the A/B cosine and the results are impressive. Thanks a lot! 00018-4146499890

SwiftIllusion commented 1 year ago

@recoilme Awesome, I'm really happy to hear that :), thank you very much for the original inspiration. That result is amazing :D.

mariaWitch commented 1 year ago

theta_0[key] = theta_1[key] (1 - k) + theta_0[key] k

@SwiftIllusion Why was this changed from theta_0[key]= theta_0[key] (1-k) + theta_1[key] k to the line above? This seems a bit backwards now.

SwiftIllusion commented 1 year ago

@mariaWitch This was to fix the fact it was actually previously merging backwards (if you put the weight at 0.75, it would have been 0.75 to modelA instead of modelB). Now, as per the examples in the guide which was made after this fix, it has the output correctly going from 0 A to 1 B.

mariaWitch commented 1 year ago

Thank you! That is clear enough for me.

SwiftIllusion commented 1 year ago

@hako-mikan After getting the previous merge methods working, I was left with a theory of a theory to tackle a problem I wasn't sure would be possible, eventually I discovered it was possible. I couldn't explain the technicality of it, and GPT never knew what I was trying to do, but I tested it and have since been working with it after discovering/confirming it properly worked, to see just how far I could push its potential myself and so I could work out a guide/tips for it too to give people a full head start on how to work with it and what to try avoid. I don't know why it works, but it works, and significantly expands the possibilities of merges and models. It's something I hope can positively evolve how people develop merged models and share trained models.

I hope you can and would appreciate you adding this whenever you get the opportunity, I've provided the code at the end of this post like previously, adding it into the latest version (Commits on Jun 5, 2023) as a new choice for Add Difference.

The guide for this new method

trainDifference

Available modes : Add difference

This method at its simplest, can be thought of as a 'super Lora' for permanent merges, it no longer adds the calculated difference between (B)-(C) models to model (A), now it 'trains' that difference as if it was finetuning it relative to model (A).

Comparisons

Usage guidance

Possibilities and general usage
Expand a model with new concepts, or reinforce existing concepts (and quality output), instead of mixing Sci-Fi Diffusion as an example https://civitai.com/models/4404?modelVersionId=4980 was trained on general sci-fi images. You don't have to merge/mix into it anymore, you can use this to practically train Sci-Fi into your model by trainDifferencing it against SDv1.5, you aren't limited to generating an aproximated Lora difference for expansion. Another example is you could cosime similarity merge [Analog Diffusion](https://civitai.com/models/1265/analog-diffusion) and [Timeless Diffusion](https://civitai.com/models/3557?modelVersionId=3936) that are similar in nature (and you wouldn't want to re-inforce the negative elements of the photographs too much) then trainDifference [Modelshoot Style](https://civitai.com/models/2147/modelshoot-style) ontop of that which focuses on medium body shots with a stronger photography foundation built by the previous merge. The potential for models, being able to now in a sense 'continue training' with broad models like [Surreality](https://civitai.com/models/21666?modelVersionId=25854) and [seek.art MEGA](https://civitai.com/models/1315?modelVersionId=22808) that gratefully lifted their license restrictions with V2, is now much larger than when it was limited to mixing them into models (though of course the utility for styling with different weighting of ins/outs etc all still has its value, and everything depends on your goal). Also models like [RPG](https://civitai.com/models/1116?modelVersionId=7133) with v5 sounding like it is being developed from SDv1.5 instead of a merge, with this can be trained into models without the heavy NSFW/female bias in many from F222/etc merges.
Direction of trainDifference and style of the difference matters It is harder for a model to learn to be realistic, than to be stylistic. For example if building a model that intends to eventually be stylistic, consider having multiple model branches based on similar styles, to eventually trainDifference the stylistic branch onto the most realistic branch. Generally you should merge anime/cartoon > stylish > realistic, if the styles differ.
trainDifference is not always the best solution Sometimes depending on the type/scope of the difference, cosine similarity merge can provide better results (if the differences aren't from SDv1.5 already, trainDifference both onto SDv1.5, and then cosine similiarity merge them from there before you trainDifference it back onto your working model). Also, sometimes if the material is similar but large and varied, the best result can come from using trainDifference in both directions, and then weight-sum merge between those 2 to find the best result, like [waifu diffusion](https://huggingface.co/hakurei/waifu-diffusion-v1-3) and [Acertainty](https://huggingface.co/JosephusCheung/ACertainty).
Gain the benefits of a trained model anywhere Models like [knollingcase](https://civitai.com/models/1092?modelVersionId=1093) and [Bubble Toys](https://civitai.com/models/23945/bubble-toys-the-model) are cool, but their effort has been limited by the framework they were trained on. Now you can trainDifference them onto any of the newer models that people have developed. Additionally some people that have made checkpoints instead of Lora's mentioned trying Lora first but without getting valuable results, with trainDifference their work can still be applied onto any model.
Limitations and what to avoid/problems and solutions
Knowing and having access to the origin of the model pre-training is required A lot of models have some mix of SDv1.4 now. This trainDifference merge is accurate enough that, if you were to try and for example train 'rev animated' onto 'Sci-fi Diffusion' with SDv1.5 as model (C), because 'rev animated's origin is an unknown ratio between SDv1.4 and SDv1.5 (and mix of individual in/out weights too), the merge would negatively affect the output (the 'training' would be offset/distorted), but you could trainDifference 'sci-fi Diffusion' onto 'rev animated' because it was trained on SDv1.5.
After enough time / with similar materials, 'burning'/'over training' can eventually occur You can 'pull back' the model at this point by cosine similarity merging it with SDv1.5, which helps ground it while keeping more qualities from the training.
After enough merges, the 'clip/comprehension' can become heavy, negatively effecting simple prompts For example complex prompts may still look good, but 'female portrait, blue eyes' could spill the 'blue' concept too much. To help avoid this, as you make trainDifference merges or large scope, you can use [model toolkit](https://github.com/arenasys/stable-diffusion-webui-model-toolkit) to manipulate the clip. Load the final model into that extension, and create 2 different models. 'clipA' importing the clip of your base model, 'clipB' importing the clip of what you trained into it, and use a regular weightsum merge to find the best output/comprehension between those 2 models, to soften out the clip as you expand your model. Sometimes weightsum merging the final model with a version of it using the SDv1.5 clip can be better than mixing between clipA and clipB.

Practical demonstration

The code for this new method

In supermerger.py replace

                    calcmode = gr.Radio(label = "Calcutation Mode",choices = ["normal", "cosineA", "cosineB", "smoothAdd","tensor"], value = "normal") 

with

                    calcmode = gr.Radio(label = "Calcutation Mode",choices = ["normal", "cosineA", "cosineB", "trainDifference", "smoothAdd","tensor"], value = "normal") 

Then in mergers.py replace

    if MODES[1] in mode:#Add
        if stopmerge: return "STOPPED", *non4
        theta_2 = load_model_weights_m(model_c,False,False,save).copy()
        for key in tqdm(theta_1.keys()):
            if 'model' in key:
                if key in theta_2:
                    t2 = theta_2.get(key, torch.zeros_like(theta_1[key]))
                    theta_1[key] = theta_1[key]- t2
                else:
                    theta_1[key] = torch.zeros_like(theta_1[key])
        del theta_2

with

    if MODES[1] in mode:#Add
        if stopmerge: return "STOPPED", *non4
        if calcmode == "trainDifference":
            theta_2 = load_model_weights_m(model_c,True,False,save).copy()
        else:
            theta_2 = load_model_weights_m(model_c,False,False,save).copy()
            for key in tqdm(theta_1.keys()):
                if 'model' in key:
                    if key in theta_2:
                        t2 = theta_2.get(key, torch.zeros_like(theta_1[key]))
                        theta_1[key] = theta_1[key]- t2
                    else:
                        theta_1[key] = torch.zeros_like(theta_1[key])
            del theta_2

and replace

    if MODES[2] in mode or MODES[3] in mode:#Tripe or Twice
        theta_2 = load_model_weights_m(model_c,False,False,save).copy()
    else:
        theta_2 = {}

with

    if MODES[2] in mode or MODES[3] in mode:#Tripe or Twice
        theta_2 = load_model_weights_m(model_c,False,False,save).copy()
    else:
        if calcmode != "trainDifference":
            theta_2 = {}

and replace

            if usebeta and (not key in theta_2) and (not theta_2 == {}) :
                continue

with

            if calcmode == "trainDifference":
                if key not in theta_2:
                    continue
            else:
               if usebeta and (not key in theta_2) and (not theta_2 == {}) :
                    continue

and between "cosineB" and "smoothAdd" methods, add (note multiplying current_alpha by 1.8 is intentional, I don't understand the maths, but from testing that makes the 'training' amount equivelant to 1:1 when current_alpha is set to 1)

            elif calcmode == "trainDifference":
                # Check if theta_1[key] is equal to theta_2[key]
                if torch.allclose(theta_1[key].float(), theta_2[key].float(), rtol=0, atol=0):
                    theta_2[key] = theta_0[key]
                    continue

                diff_AB = theta_1[key].float() - theta_2[key].float()

                distance_A0 = torch.abs(theta_1[key].float() - theta_2[key].float())
                distance_A1 = torch.abs(theta_1[key].float() - theta_0[key].float())

                sum_distances = distance_A0 + distance_A1

                scale = torch.where(sum_distances != 0, distance_A1 / sum_distances, torch.tensor(0.).float())
                sign_scale = torch.sign(theta_1[key].float() - theta_2[key].float())
                scale = sign_scale * torch.abs(scale)

                new_diff = scale * torch.abs(diff_AB)
                theta_0[key] = theta_0[key] + (new_diff * (current_alpha*1.8))

and after the last "del theta_1"add

    if calcmode == "trainDifference":
        del theta_2
hako-mikan commented 1 year ago

Added trainDifference. Thanks!!

SwiftIllusion commented 1 year ago

No worries, thank you very much for your work on this/implementing it :) .

miasik commented 1 year ago

@SwiftIllusion @mariaWitch I just want to say that @sverfier8807 has implemented multi-threading for smoothAdd and now it works much faster.

144

StAlKeR7779 commented 1 year ago

@SwiftIllusion @hako-mikan maybe here instead of 1.8 multiplier should be 2? //H - harmonic mean image

SwiftIllusion commented 1 year ago

@StAlKeR7779 Sorry as I don't know the value of the math you're displaying and I appreciate the thought to improve it further, but I did many tests across different merges (e.g. including merging models trained on SDv1.4 to SDv1.5 or SDv1.5 to Practically the same model, and different Lora comparisons which you can see in the guide how it correlates in strength). Anything beyond 1.8 started to 'burn/over-train' in a way that appeared greater than the original (I tested from 1 to 2). Even 1.9 appeared too much which surprised me at the time as I was expecting 2 to be the most natural value if it required more than 1, but 1.8 was the most representative and I've used it greatly since then with that value.