Closed tengerye closed 5 years ago
Nice catch! This is not clear from the paper, the only mention that they retrain from already trained parameters is made in the last sentence in the caption for Figure 2.
@expectopatronum @kohpangwei But I don't think that is right. If we want to calculate the true difference, we should kick that training example and train from scratch according to my understanding.
You might be right, and this was also what I thought it does. I am just reporting what I found in the paper.
For convex models, it does not matter (in theory) whether we retrain from a warm start or cold start. For non-convex models, it does matter, and in general we can expect that retraining from a cold start would give quite different results (e.g., due to different random initializations). This variance will generally swamp the effect of removing a single training point. To get around this issue, we retrain from the initial learned parameters \tilde{\theta}, as @expectopatronum pointed out.
When removing groups of examples, we might expect that retraining from a cold start might give similar results, but AFAIK this has not been tested systematically. See https://arxiv.org/abs/1810.03611 and https://arxiv.org/abs/1905.13289 for more details if you're interested!
@kohpangwei Got it. Thank you so much. @expectopatronum Thank you for your kind reply, too.
Hi, when I read the code, I noticed the parameters are only initialized at the creation of a model. In other words, the
retrain()
actually continue training without re-initialization. Is that alright?