Are samples used for warmup training and gradient calculation the same?

princeton-nlp / LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning

MIT License

307 stars 25 forks source link

Are samples used for warmup training and gradient calculation the same? #5

Closed ZigeW closed 2 months ago

ZigeW commented 5 months ago

Hi,

I'm trying to run experiments following the instructions given in README.

I find that in Step 1 warmup training, 5% of samples are randomly selected to train $M_S$. But in Step 2 Building the gradient datastore, the selected samples used to calculate gradients seem to be fixed as the first 200 samples of each dataset.

This makes me confused about whether the samples used for warmup training and gradient calculation should be the same, can you kindly explain it to me?

xiamengzhou commented 5 months ago

Hi sorry for the late reply!

In the first step we use 5% of the full dataset to perform warmup training to get the Adam optimizer states. When calculating the gradients in the second step, you should use the full dataset, including the data used for warmup training. Let me know if you have more questions!