Closed negar-foroutan closed 1 year ago
Hi, this is a great question! Looking at the GPT-J hparams, clamp_norm_factor
is perhaps the most important. It is a hard constraint that determines how large $v_*$'s norm can be, with respect to the original hidden representation. If it's too high, bleedover will be high (update unnecessarily large), but if low, the update will not work.
Other soft constraints like weight decay and KL divergence should also be tuned. A good rule of thumb is to start with non-constraining values (e.g., no weight decay, no KL loss, high clamp factor) and make sure the maximum-DOF update works. Then increase constraints to eliminate bleedover effects.
The ROME notebook (notebooks/rome.ipynb
) is an excellent place to experiment with these values. The hparams files are hot-reloaded on every run of the execution cell, so iteration speed is relatively fast.
If you have any model-specific questions, I'd be happy to take a look when I get a moment. lmk!
Thank you very much.
Hi! Thank you very much for making your implementation publicly available. I want to use ROME on different LMs and datasets than those you tried in the paper. I was wondering which hyperparameters are model- or data-dependent and whether you have an intuition/strategy for finding values for them. Thanks!