A question about reproducing the NCE and LEEP results

zhangyikaii commented 2 years ago

Hi, thanks for your excellent work and friendly code style again. I encountered some difficulties in reproducing the NCE / LEEP results. Sorry, I am confused again.

I guess the calculation of NCE and LEEP scores in the paper may be wrong. Actually, I used the opposite of pseudo_source_label and target_label to obtain the results of the paper, as shown in the code below:

score = NCE(source_label=targets.numpy(), target_label=torch.argmax(predictions, dim=1).numpy())
score = LEEP(pseudo_source_label=np.squeeze(np.eye((torch.max(targets) + 1).item())[targets.numpy().reshape(-1)]), target_label=torch.argmax(predictions, dim=1).numpy())

where the predictions variable is calculated from output, i.e.,

features, outputs, targets = forward_pass(self.data_loader_train, model, fc_layer)
predictions = F.softmax(outputs, dim=1)

The above wrong code successfully reproduces the results of NCE and LEEP in the paper, i.e., Section C. Original Results in Figures - Table 5. Original results in Figure 4.

From the NCE and LEEP code, the correct way to pass the pseudo_source_label and target_label would be:

score = NCE(source_label=torch.argmax(predictions, dim=1).numpy(), target_label=targets.numpy())
score = LEEP(pseudo_source_label=predictions.numpy(), target_label=targets.numpy())

But this does not yield the results in the paper. This will impact the calculation of the final weighted tau result.

Thanks again for your great work and open source spirit. I'm really sorry to come back to you for advice. Thank you very much.

youkaichao commented 2 years ago

Hi, thanks for your interest. I will check it carefully and give a reply in a few days.

youkaichao commented 2 years ago

LEEP/NCE in the paper were calculated by historical code with bugs. Per the request of this issue, I gave an implementation of LEEP and NCE code. I fixed a bug by the way, and the LEEP/NCE code in this repo should work.

I forgot to update the results. New results are available here, calculated by the LEEP/NCE code in this repo.

Conclusions from both the paper and the updated results are the same: LEEP/NCE are quite unstable and the overall performance are inferior to LogME in many datasets.

I will make it clear in the documentation. Sorry for the inconvenience.

zhangyikaii commented 2 years ago

Thank you very much!

祝顺利.

thuml / LogME

A question about reproducing the NCE and LEEP results #14