Closed aken12 closed 1 year ago
Hi @aken12, "Tevatron/msmarco-passage" is using BM25 hard negatives but with 2 additional treatment
title
@MXueguang Thank you for your kind response! In examples/coCondenser-marco/get_data.sh, I cannot find the code to select a positive passage.
1. if the positive passage is not shown in top200 BM25 hits, the example is dropped, so results in ~400k examples rather than original 500k.
Do we need to add a process to select positive passages, in addition to running get_data.sh? Or, is qidpidtriples.train.full.2.tsv.gz already processed? (I guess the latter is not correct.)
Thanks :)
it seems qidpidtriples.train.full.2.tsv.gz only has 400k queries.
Oh, yes, that's true. I understand now, thank you!!
Hi :) Thank you for your great work! I read this issue (https://github.com/texttron/tevatron/issues/66#issuecomment-1473047705). In this description,
does it means "Tevatron/msmarco-passage" is from cleaned msmarco? @MXueguang
I would like to know whether it uses just BM25 hard negatives or hard negatives with some added treatment.