palm-biaoliu / mime

0 stars 0 forks source link

Which Pseudo Label function should be used for MIME? #1

Open JoakimHaurum opened 6 months ago

JoakimHaurum commented 6 months ago

Hi, I am currently trying to better understand the codebase and paper and I am a little confused which of the different pseudo labeling functions is described in Algorithm 1 of the paper:

This version seems to work by thresholding the predictions: https://github.com/palm-biaoliu/mime/blob/main/losses.py#L185

These here seem to simply assign at the N_class highest predictions per class: https://github.com/palm-biaoliu/mime/blob/main/losses.py#L280 https://github.com/palm-biaoliu/mime/blob/main/train_mime.py#L112

But they also assume knowledge of the the ground truth train labels.

None of them seem to exactly match Algorithm 1 where assignment is based on the difference in loss when assigning a pseudo label (see line 12).

Could you please provide pointers toward which is the correct function and how it relates to Algorithm 1 in the paper?

palm-biaoliu commented 2 months ago

Hi, I am currently trying to better understand the codebase and paper and I am a little confused which of the different pseudo labeling functions is described in Algorithm 1 of the paper:

This version seems to work by thresholding the predictions: https://github.com/palm-biaoliu/mime/blob/main/losses.py#L185

These here seem to simply assign at the N_class highest predictions per class: https://github.com/palm-biaoliu/mime/blob/main/losses.py#L280 https://github.com/palm-biaoliu/mime/blob/main/train_mime.py#L112

But they also assume knowledge of the the ground truth train labels.

None of them seem to exactly match Algorithm 1 where assignment is based on the difference in loss when assigning a pseudo label (see line 12).

Could you please provide pointers toward which is the correct function and how it relates to Algorithm 1 in the paper?

Apologies for the delayed reply, and thanks for your patience! The method in the conference version is indeed based on the difference in loss when assigning a pseudo label, which, after deriving the formulas, turns out to be equivalent to the thresholding approach based on the prediction. As for the approach of assigning pseudo labels to the N_class highest predictions per class. This is an extended pseudo-labeling method we tested afterward. It estimates class priors from the validation set and applies pseudo labels to the N_class highest predictions per class based on those priors. It does not use knowledge of the ground truth train labels.

For clarity, we’ve updated the pseudo-labeling method in the main branch to match the original method described in the paper. The extended pseudo-labeling method, which assigns labels based on the N_class highest predictions per class, has been moved to a separate branch.