pomonam / kronfluence

Influence Functions with (Eigenvalue-corrected) Kronecker-Factored Approximate Curvature
Apache License 2.0
97 stars 8 forks source link

About the range of influence scores in Figure 8. #36

Closed Hanlard closed 1 month ago

Hanlard commented 1 month ago

Firstly, I would like to express my sincere gratitude for your open-source work. I have been studying and attempting to replicate the experiments from this paper (https://arxiv.org/pdf/2308.03296) for the past 6 months. However, I have noticed some discrepancies between my experimental results and those presented in Figure 8, along with some questions.

1) The range of my influence scores is [1e+4, 1e+7], whereas the range in Figure 8 is (0,1). From some of your experimental results (kronfluence/examples/openwebtext/files/scores_raw), I found that your influence scores are also within the range of [1e+4, 1e+7]. However, the paper P27 states, 'As shown in Figure 8, influence values larger than 0.1 are rare, and none of the 8 queries visualized have any sequences with an influence larger than 1. Because the information content of the completion is much larger than 1 nat, it appears that the examples we have investigated were learned from the collective contributions of many training examples rather than being attributable to just one or a handful of training examples.'

2) We observed that samples corresponding to negative influence scores are clearly related to the query, yet the negative influence values are not adequately discussed in the paper.

pomonam commented 1 month ago

I appreciate your interest in our work! As noted in the technical documentation (please see the last section), influence scores should be divided by the number of training examples (or total tokens). However, the current codebase does not do this normalization, which is why you observe a large value. Regarding the second question, they are indeed often related to the query. In the case of image classification, we find that negatively influential training sequences often correspond to similar images but with different labels (see last figures in this paper). On page 16, our paper mentions that the analysis focuses only on positively influential sequences. It would also be interesting to analyze negative influence scores, but this was not done in the paper, and we did not explore them. Please let me know if you have any other questions :).

pomonam commented 1 month ago

I will close the issue for now; please let me know if you have further questions!