Model- and data-dependent hyperparameters

negar-foroutan commented 1 year ago

Hi! Thank you very much for making your implementation publicly available. I want to use ROME on different LMs and datasets than those you tried in the paper. I was wondering which hyperparameters are model- or data-dependent and whether you have an intuition/strategy for finding values for them. Thanks!

kmeng01 commented 1 year ago

Hi, this is a great question! Looking at the GPT-J hparams, clamp_norm_factor is perhaps the most important. It is a hard constraint that determines how large $v_*$'s norm can be, with respect to the original hidden representation. If it's too high, bleedover will be high (update unnecessarily large), but if low, the update will not work.

Other soft constraints like weight decay and KL divergence should also be tuned. A good rule of thumb is to start with non-constraining values (e.g., no weight decay, no KL loss, high clamp factor) and make sure the maximum-DOF update works. Then increase constraints to eliminate bleedover effects.

The ROME notebook (notebooks/rome.ipynb) is an excellent place to experiment with these values. The hparams files are hot-reloaded on every run of the execution cell, so iteration speed is relatively fast.

kmeng01 commented 1 year ago

If you have any model-specific questions, I'd be happy to take a look when I get a moment. lmk!

negar-foroutan commented 1 year ago

Thank you very much.

kmeng01 / rome

Model- and data-dependent hyperparameters #25