nimarb / pytorch_influence_functions

This is a PyTorch reimplementation of Influence Functions from the ICML2017 best paper: Understanding Black-box Predictions via Influence Functions by Pang Wei Koh and Percy Liang.
Other
315 stars 71 forks source link

Passing config['test_start_index'] to calc_grad_z #9

Open expectopatronum opened 4 years ago

expectopatronum commented 4 years ago

Hi, I am not sure if I am misunderstanding the parameter or if it shouldn't be passed to calc_grad_z: https://github.com/nimarb/pytorch_influence_functions/blob/4df5d2ec1baae38d70345740b7eca7466e3b48ef/pytorch_influence_functions/calc_influence_function.py#L111 I assumed it should loop over the whole training set.

Thanks and best regards Verena

andrewsilva9 commented 4 years ago

I believe it does loop over the entire training set, the train_loader is a python dataloader that carries all that info with it. We can see here: https://github.com/nimarb/pytorch_influence_functions/blob/4df5d2ec1baae38d70345740b7eca7466e3b48ef/pytorch_influence_functions/calc_influence_function.py#L133 that it is iterating over the dataset that gets sent along with the dataloader (which would be the entire training set).

nimarb commented 4 years ago

Hi, @andrewsilva9 is correct in line 133 it loops over the entire training dataset so that one grad_z is calculated per training sample.

With the start argument you can start at a different point in the training dataset. This can be used if you split the calculation across multiple machines. You calculate samples [0-100] on machine 1 and on machine two you pass start=101 to calculate from training sample [101-x]. The end x is missing here in the implementation still...