princeton-nlp / LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
MIT License
376 stars 36 forks source link

What if we have the optim states of specific tasks at the beginning? #2

Closed suhmily closed 9 months ago

suhmily commented 9 months ago

I wonder if we can use the optim states of the original large model instead of warming up by lora if we have optim states at the begining?

xiamengzhou commented 9 months ago

Yes, I think it's definitely worth trying! Another thing to keep in mind is there the warmup might also be for switching the model to the instruction tuning mode. So if the data distribution that leads to the optimization states is pretty different from what you are going to select, it might be suboptimal too.

xiamengzhou commented 9 months ago

Closing the issue now, feel free to reopen it if you have more questions!