locuslab / tofu

Landing Page for TOFU
MIT License
83 stars 18 forks source link

Hyperparameter issues in configs #11

Closed molereddy closed 6 months ago

molereddy commented 6 months ago

I'm not sure the hyperparameters in the config files match the reported ones in the paper.

  1. for example https://github.com/locuslab/tofu/blob/main/config/forget.yaml#L9 uses 1% forget set, but https://github.com/locuslab/tofu/blob/main/config/forget.yaml#L20C23-L20C44 in the same file uses a 90% retain set.
  2. https://github.com/locuslab/tofu/blob/main/config/forget.yaml#L13 uses 10 epochs while the paper says all experiments are done with 5 epochs.

I also question if it makes sense that for every kind of loss, we use the same number of epochs. These losses have very different natures and different stabilities. This would, imo, lead to unnecessary overfitting for some methods.

molereddy commented 6 months ago

The readme has different hyperparameters as well (lr=2e-5 instead of 1e-5 as in the paper and elsewhere in the code). Also in the data/ directory we have json splits only for wd 0.01 but 0.0 is also tried. I would also appreciate a clarification about what these splits are for and how they are used.

pratyushmaini commented 6 months ago

Hi @molereddy

  1. Epochs: Experiments reported in the paper were run for 5 epochs since that gave better performance. 10 epochs generally lead to catastrophic forgetting
  2. Using different parameter values for different methods: Totally agree with you here. We did do a reasonable hp search for our work, but the goal was not to get the best models but rather benchmark all methods on a reasonable search space. We are super excited to see researchers such as yourself to improve upon the simple baselines.
  3. in the same file uses a 90% retain set: All the retain set evaluations are done on the 90% retain set, irrespective of the forget set. This was a conscious choice so that we can keep the retain data consistent among all the 3 unlearning challenges. To be clear, the 90% retain set is a subset of the 99% retain set.
  4. We evaluated with both wd = 0, 0.01. Generally we saw better results with wd=0.01.
  5. The finetuning learning rate chosen for Phi is 2e-5. The best performing forgetting learning rate was 1e-5.
molereddy commented 6 months ago

Thank you for the clarifications!