Not Being able to reproduce Bert results

jrzaurin / pytorch-widedeep

A flexible package for multimodal-deep-learning to combine tabular data with text and images using Wide and Deep models in Pytorch

Apache License 2.0

1.26k stars 186 forks source link

Not Being able to reproduce Bert results #169

Closed TheLegendAli closed 1 year ago

TheLegendAli commented 1 year ago

Hi I tried to reproduce my results from my bert finetuning using this library. Unfortunately, I could not replicate my results, and the F1 score falls by half. I ran into the same issue trying to reproduce this tutorial as well.

here is a colab notebook of what I did.

Am I doing anything wrong?

Thanks in advance

jrzaurin commented 1 year ago

Hey @TheLegendAli thanks for opening the issue, I will look into it asap 🙂

jrzaurin commented 1 year ago

I think I know what might happen (bear in mind I looked in my phone and is 6:30am here in the London). I think you are passing pred_dim = 2 while this is a binary classification problem and has to be equal to 1, look here: https://github.com/jrzaurin/pytorch-widedeep/blob/master/pytorch_widedeep/models/wide_deep.py#L86

To be honest we should be better/more explicit regarding this parameter, since I see how it can cause confusion.

Anyway, let me know if this was the issue and thanks for opening it, I will add some warning in the next release.

TheLegendAli commented 1 year ago

Hi thanks for the feedback, that definitely did improve the results. However, there is still a wide difference in performance. I think I might be doing something else wrong. And I hope you had some coffee in the morning

jrzaurin commented 1 year ago

okay then, I'll check

jrzaurin commented 1 year ago

Hey @TheLegendAli , so, after a quick look my main comment would be that you need to make sure you are comparing the same thing, since at the moment this is not the case.

For example, in one case you are using bert-base-uncased with a certain tokenizer set up, while in another case you are using distilbert-base-uncased. This should not really matter much, but is worth pointing out.

Perhaps a more relevant aspect is the fact that in one case you are not freezing the model weights, resulting in you finetunning the model to the data. However, when using the library you are completely freezing the weights (by setting up freeze_bert=True).

Let's do something, since you are in the slack channel let's close the issue and move the conversation there :)

Thanks for opening it!