luyug / Condenser

EMNLP 2021 - Pre-training architectures for dense retrieval
Apache License 2.0
245 stars 23 forks source link

Have you tried condenser pretraining on RoBERTa ? #17

Open 1024er opened 2 years ago

1024er commented 2 years ago

I pretrained a condeser-roberta-base on the same data and hyperparameters, but the results on downstream tasks were not high.

Have you ever tried condenser pretraining on RoBERTa-base ?

Thank you

luyug commented 2 years ago

Same data no. I have trained with openwebtext (a open version of web text, part of Roberta training data) with a base architecture Roberta. It does better on sentence similarity task but not on retrieval tasks, when compared with Bert condenser. As a side note, we observed previously that vanilla Roberta base is typically inferior to vanilla Bert base on retrieval tasks.

We have just started test runs with condenser-roberta-large and therefore not much to say there yet.