texttron / tevatron

Tevatron - A flexible toolkit for neural retrieval research and development.
http://tevatron.ai
Apache License 2.0
494 stars 94 forks source link

questions about negative sample #72

Closed pygongnlp closed 1 year ago

pygongnlp commented 1 year ago

So thanks for the novel work.

I have a question that if I use a custom dataset with the same format in MS MARCO, how to choose the negative sample on dev/test set?

For example, if I want to find if BM25 negative sample is better than random sample in my filed. I use BM25 negative samples in the training stage, negative samples the dev/test set should be made by BM25 or random?