vgaraujov / CPC-NLP-PyTorch

Implementation of Contrastive Predictive Coding for Natural Language
10 stars 3 forks source link

Tips for training CPC #4

Open YuffieHuang opened 2 years ago

YuffieHuang commented 2 years ago

Thanks @vgaraujov for providing the code. I've been playing with the code for a month and found some tricks to better train the CPC model. Let me share them here for all who are interested.

  1. Add a dropout layer after the GRU layer Overfitting occurs when training CPC with the default setup. I add a dropout layer right after GRU to fix the issue. Also, we need to enlarge GRU hidden size to increase the generality of the model. I set the dropout rate to be 0.3 and rise the GRU hidden size from 2400 to 4000. It might not be the best combination but it works.

  2. Increase max sentence length and concatenate some sentences in the BookCorpus dataset The BookCorpus dataset contains quite a lot of short sentences. I concatenate adjacent sentences and increase the limitation of sentence length so that each epoch contains fewer iterations. It greatly increases the training speed.

Before:

image

After:

image

Now I have got a similar training result as is shared in here. image

I tested the classification on Movie Review using a checkpoint and got an accuracy as 71%, which is not as good as what is stated in the paper (76.9%). I will spend more time in hyperparameter optimization.

vgaraujov commented 2 years ago

Hey @YuffieHuang, thanks for sharing. I plan to revisit this code and model at the end of this month, so any insight is welcome.

One thing I want to test, and maybe you should try, is the normalization of GRU resulting representations. For instance: normalized_output = F.normalize(output, dim=1). Also, you can use a temperature parameter in the InfoNCE loss. See equation 1 of this paper.

If you agree, we could keep talking about improving the model to update it then.

YuffieHuang commented 2 years ago

Hi @vgaraujov. Sure, let me add the normalization first and see how it goes.