princeton-nlp / LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
MIT License
306 stars 25 forks source link

Questions about the influence value #21

Open simplelifetime opened 2 months ago

simplelifetime commented 2 months ago

I'm wondering what's the appropriate influence value in LESS setting. I'm reproducing it and the max influence value across all sub-tasks are about 0.1-0.4 for some samples. Is this value correct? Or the similarity should be more or less than this value(0.1-0.4 for the most similar sub-task).

xiamengzhou commented 2 months ago

Yes, this value is largely within the range of the max gradient similarity for a subtask!