Closed molereddy closed 6 months ago
The readme has different hyperparameters as well (lr=2e-5 instead of 1e-5 as in the paper and elsewhere in the code). Also in the data/ directory we have json splits only for wd 0.01 but 0.0 is also tried. I would also appreciate a clarification about what these splits are for and how they are used.
Hi @molereddy
in the same file uses a 90% retain set
: All the retain set evaluations are done on the 90% retain set, irrespective of the forget set. This was a conscious choice so that we can keep the retain data consistent among all the 3 unlearning challenges. To be clear, the 90% retain set is a subset of the 99% retain set.Thank you for the clarifications!
I'm not sure the hyperparameters in the config files match the reported ones in the paper.
I also question if it makes sense that for every kind of loss, we use the same number of epochs. These losses have very different natures and different stabilities. This would, imo, lead to unnecessary overfitting for some methods.