nyu-dl / dl4marco-bert

BSD 3-Clause "New" or "Revised" License
476 stars 87 forks source link

Cross Validation #31

Closed bayou3 closed 5 years ago

bayou3 commented 5 years ago

Hi, I read the training paragraph in your paper, I find that there is no description about cross validation in it. Because my dataset is not large as msmarco, I care about this problem. I process my dataset as this way: I have a human-judged file, which list which docid is relevant, which is not. Then I use these labellbed docid to generate the triple file, namely query, positive_doc, negative_doc. And I also have an initial ranked list, which lists top n docids for each query. I use this initial ranked list to get a dev.tsv file for prediction phase. Should I need to cross validation during the training phase? And how to modify the training code? Or Is it right that the way I do?

rodrigonogueira4 commented 5 years ago

For cross-validation, you should split training and dev set at the query level: