Weights activation - Githubissues

ljleb commented 1 year ago

It would be really cool if we had access to the activation of weights for each key during merging. We could use the activations as a saliency map of the weights, to tell us which weight contributed more to the generation.

The way I see this working:

pass generation parameters as input to the tool in some form (config, cli args...)
for each key, in order:
- pass the last activations through B's key
- use the activations to decide what to merge into A
- go to the next key

Merging multiple times in a row using multiple prompts would make it possible to merge a comprehensive, wisely elected set of weights. Maybe you can even do transfer learning with this.

I am not sure how well this would really fare. I am also unsure how easy this would be to implement. Maybe instead of making model calls ourselves, we could expose some kind of api other tools interact with to share the activations of each key. This would make it possible to use comfyui or a1111 with sd-meh.

s1dlx commented 1 year ago

this sounds like a fun idea, not sure it belongs to sd-meh tho unless you want then to use the activations to drive the merge somehow

it's probably easier to implement as a webui extension where we can exploit all the pipeline already written

ljleb commented 1 year ago

unless you want then to use the activations to drive the merge somehow

That's the idea actually. One way could be to use the activations as a per-weight alpha or beta parameter in any merge method, for example weighted sum. Another idea would be to have a dedicated merge mode.

Possible there is a way to have an extension do all the work regarding the detection of activations. It would be nice to only have to store the current key's activations + the residual layers results, to minimize memory usage footprint.

Maybe there is a way to stream per-weight alphas and betas into sd-meh? And let other code interact with this feature to control merges.

ljleb commented 1 year ago

One advantage of having meh do the generation, is that you wouldn't need to load the input models more than once. This will take a lot of work to achieve so it may not be practical. Otherwise, another approach could be to save activations as a checkpoint using an external extension, and just load them from meh. sd-meh would just have to be able to use these weights as a per-weight alpha or beta.

ljleb commented 1 year ago

I re-read the git re-basin paper again, hoping to grasp a bit more of the content. It seems one approach they are suggesting is using weights activation to evaluate matching. So on top of passing weights as alpha and beta, we could allow to pass the activations to re-basin so that the it can determine the relationship between the weights.

s1dlx / meh

Weights activation #25