Closed ShellingFord221 closed 4 years ago
Hi Shelling,
Yes it would still be necessary to calibrate because the formula for updating the classifier to the test distribution using Bayes’ rule assumes calibrated probabilities.
Also, in the situation you mention, the ideal thing to do would probably still be to leverage the remaining 80% of the test data to estimate q(y) in a semi-supervised fashion. I suspect the formula for the semi-supervised update would simply alter the M-step of EM to be:
where l_i is the number of examples that your ground-truth labels say are in class i, L is the total number of ground-truth labeled examples you have, and U is the total number of unlabeled examples you have.
Does that answer your question?
Thank you for your reply! Proof of Lemma A indicates that calibration is sufficient and necessary for the convergence of EM, but can we just simply consider that the distribution in the sampled set (e.g. 20% of test set) can represent the whole distribution of test set? i.e.
Will this method cause large error compared with EM when estimating q(y)? Thanks!
Hi Shelling,
If your sample is large enough, then the estimate of q(y) from sampling+labeling will likely be accurate. However, knowing q(y) does not in itself give you the updated predictor q(y|x). When I said you would still need to calibrate, I meant that in order to obtain the predictor q(y|x), you would need calibrated probabilities. This is because the formula for estimating q(y|x) is:
Even if you have accurate estimates of q(y=i) from sampling, the formula above still assumes that \hat{p}(y=i|x) is calibrated. Does that make sense? If you did not want to calibrate, then the other solution that has been proposed in the literature is to retrain the model using the ratio \hat{q}(y=i)/\hat{p}(y=i) for importance weighting - however, Byrd & Lipton (https://arxiv.org/abs/1812.03372) observed that even importance weighting does not work that well, so I feel that it is still a good idea to calibrate \hat{p}(y=i|x) and then apply Bayes' rule as shown above.
As an aside, Lemma A is actually just showing that, if your predictor has systematic bias in the calibration, then a particular way of calculating the source domain priors \hat{p}(y=i) is necessary if you want EM to converge to the source-domain priors in the absence of label shift. It is actually an observation on how to cope with poor calibration.
Let me know if you still have questions.
Regards, Avanti
Thanks for all the details! I'm a little bit confused that why Eq. 8 above is under the assumption that p^hat(y=i|x) is calibrated? There may be something I have missed...
Hi Shelling,
Eqn. 8 is derived from Bayes' rule. The derivation is on this slide from my talk:
Bayes' rule is only relevant when we are talking about true probability distributions. Since we don't have infinite data, we don't know the true probability distributions for p(y|x), p(y) and q(y); thus, we instead use our estimates from the data, which are written as \hat{p}(y|x), \hat{p}(y) and \hat{q)(y). The inherent assumption is that \hat{p}(y|x), \hat{p}(y) and \hat{q)(y) are reasonably good replacements for the true quantities p(y|x), p(y) and q(y), which is another way of saying that they need to be calibrated. Is it more clear now?
Hi, thanks for the recommendation of video and paper! It seems that q(y=i) / p(y=i) is a ratio to make prediction in training set (i.e. p(y=i|x)) adaptive to the real distribution in test set (i.e. q(y=i|x)). Therefore, the prediction must be calibrated, otherwise it will cause error for the estimation of real distribution (on condition that q(y=i) and p(y=i) are well-estimated). Is this intuition correct? Thanks!
Yes, that is correct!
Hi, in your paper, EM procedure is leveraged to estimate the unknown distribution of q(y) in test set. However, if we can sample a portion (e.g. 20%) of test set, label it and count each class, can we just simply get q(y)? If so, will it be necessary to calibrate the classifier? Thanks!