How you do zero-initialize and how good this technique is?

lucasgblu commented 1 month ago

Hi ! congrats on this wonderful job. After reading your paper, I'm really curious about one technique that you use.

In the paper, you said:

To minimize the interference with the original model architecture, we also zero-initialize the output of the newly inserted reference attention module. This initialization ensures a smooth transition and minimal interference with the existing model’s performance.

How you do zero-initialize? is it zero-conv like ControlNet, or you just force it to be zeros? or you make the weight of Q, K or V to be zeros so that the outcome is zeros? Once you did this, will the model still learn to absorb the knowledge from the condition or it stays unlearned for zeros provide minor grads?

Finally, how good is this technique? Does it greatly or visibly improve the quality?

Congrats again

hrz2000 commented 1 month ago

Thank you for your attention to our work! We implement zero-init by setting the weight and bias of the final to_out linear layer of the attention module to 0 (most of the original knowledge is retained). Such a technique is just a trick in smooth training and will not have a big impact on the performance of the model itself.

lucasgblu commented 1 month ago

thanks for your replay! I thought at first you were doing zero gating like what LLAMA-Adaptor did.

lucasgblu commented 1 month ago

by the way, the arxiv link of your paper in the homepage misleadingly directs to your another paper @hrz2000

hrz2000 commented 1 month ago

by the way, the arxiv link of your paper in the homepage misleadingly directs to your another paper @hrz2000

Thanks for your reminder~~ I will correct it now hahaha

hrz2000 / FreeEdit

How you do zero-initialize and how good this technique is? #3