why all negative influence when increasing the amount of “anchor” data-points?

jiacheng-ye commented 2 years ago

Hello,

Thanks for your wonderful codes and paper! In Appendix C.3, I find that you use a small batch of “anchor” data points to calculate s_test. However, I found when increasing the number of “anchor” data-points, much more training data would have negative scores (i.e., helpful influence). How to reduce such bias? Any advice would be helpful! Thanks.

HanGuo97 commented 2 years ago

Hi,

Thanks for the nice words!

As of your observation -- interesting! I don't have concrete explanations in mind, but here are some thoughts.

In our early experiments (on a small-scale model), we noticed that having weight decay (for model training) helps improve the quality of influence. Please see [1] for more details.
Empirically, people noticed that the "values" of influences are usually not so accurate, though the "rankings" are more meaningful. In that sense, I would pay more attention to the rankings of scores. Please see [2] for more details.

[1] https://openreview.net/forum?id=xHKVVHGDOEk [2] https://arxiv.org/abs/1905.13289

jiacheng-ye commented 2 years ago

That really helps, thanks a lot!

salesforce / fast-influence-functions

why all negative influence when increasing the amount of “anchor” data-points? #17