why mlp_only_train=True during unsupervised training?

lankuohsing commented 7 months ago

I have noticed this sugsestion in README: --mlp_only_train: We have found that for unsupervised SimCSE, it works better to train the model with MLP layer but test the model without it. You should use this argument when training unsupervised SimCSE models.

Given pooler_type=='cls' and mlp_only_train==True, the embedding for testing during unsupervised training will not include the mlp transformation as indicated by the code in models.py(line 262, 263):

if cls.pooler_type == "cls" and not cls.model_args.mlp_only_train:
        pooler_output = cls.mlp(pooler_output)

However, if I test my model(saved after unsupervised training and converted to huggingface checkpoint by simcse_to_huggingface.py) by using evaluation.py, the embedding will include mlp transformation (given pooler_type=='cls'), as indicated by the code in evaluation.py（line 119 to line 122） :

# Apply different poolers
        if args.pooler == 'cls':
            # There is a linear+activation layer after CLS representation
            return pooler_output.cpu()

The pooler_output includes the MLP transformation because we have renamed 'mlp' to 'pooler' in simcse_to_huggingface.py):

if "mlp" in key:
            key = key.replace("mlp", "pooler")

Why is there a difference in using embeddings for testing during unsupervised training and for formal evaluation?

yaoxingcheng commented 7 months ago

Hi, sorry for the confusion. The code in models.py(line 262, 263) only affects the validation process during training as shown in trainer.py. To make sure mlp transformation is not applied using evaluation.py, you should set pooler to 'cls_before_pooler' in the evaluation script as opposed to 'cls' in the training script.

lankuohsing commented 7 months ago

Hi, sorry for the confusion. The code in models.py(line 262, 263) only affects the validation process during training as shown in trainer.py. To make sure mlp transformation is not applied using evaluation.py, you should set pooler to 'cls_before_pooler' in the evaluation script as opposed to 'cls' in the training script.

thanks!

princeton-nlp / SimCSE

why mlp_only_train=True during unsupervised training? #264