Closed Facico closed 2 years ago
I found that I was setting up the dataset incorrectly. I use the wikipedia-corpus.tar.gz according to this [README](https://github.com/texttron/tevatron/blob/7a3a05914cbeb8158b2ab8fe4c5f9990e03ef834/examples/dpr/README.md#alternatives-train-dpr-with-our-self-contained-datasets). I replace the "--encode_in_path $wiki_dir/docs$s.json" with “--dataset_name Tevatron/wikipedia-nq-corpus” and the result is fine.
Hi, I would like to ask about your experimental results on the nq dataset use bm25 negative sample
Hi, I would like to ask about your experimental results on the nq dataset use bm25 negative sample
Sorry, my experiment is based on both bm25 negative and hard negative sample. The experimental results are similar to the results of the paper(ours in nq test -> R@20: 84.9, R@100: 89.5 )
Thanks for your reply, is the result you mentioned following the steps of using the bm25 negative sample in the first stage and using the hard negative sample in the second stage?
Yes, my settings are basically the same as here: https://github.com/texttron/tevatron/tree/main/examples/coCondenser-nq
hi, I found that the number of bm25 negative samples and hard negative samples provided in nq-example is different, is this reasonable?
hi, I found that the number of bm25 negative samples and hard negative samples provided in nq-example is different, is this reasonable?
Of course, there is a high score between query and hard negative sample. But this case may not exist in some queries(the model performs well in those queries, so they don't need hard negative samples)
hi, I found that the number of bm25 negative samples and hard negative samples provided in nq-example is different, is this reasonable?
Of course, there is a high score between query and hard negative sample. But this case may not exist in some queries(the model performs well in those queries, so they don't need hard negative samples)
Thank you for your answer, but I found that biencoder-nq-train.json has 58880 questions and hn.json has 70076 questions. I don't know how this extra question was generated.
Hi, @luyug.
Thanks for your awesome work and detailed guidelines. I reproduced the model according to coCondenser-nq's [README](https://github.com/texttron/tevatron/tree/main/examples/coCondenser-nq). But I got the following results.(results from pyserini)
I think I made a mistake in one step, so that the results is lower than the results on bm25. I sequentially execute the following scripts to train the model.(The model
co-condenser-wiki
was downloaded from huggingface.)Is there any parameter I set wrong?
Thanks!