microsoft / SDR

Self-Supervised Document-to-Document Similarity Ranking via Contextualized Language Models and Hierarchical Inference
45 stars 13 forks source link

Training with custom dataset? #9

Closed puzzlecollector closed 2 years ago

puzzlecollector commented 2 years ago

@dvirginz What part of the code should I refer to if I were to train the model on my custom dataset? Also, is it necessary to perform the MLM training along with the contrastive loss? (would using the contrastive loss alone degrade performance by a lot?)