luyug / Condenser

EMNLP 2021 - Pre-training architectures for dense retrieval
Apache License 2.0
245 stars 23 forks source link

reproducing your results on MS MARCO #3

Open Narabzad opened 3 years ago

Narabzad commented 3 years ago

Hi,

Thank you for your great work! I am willing to replicate your results on MS MARCO passage collection and I have a question regarding Luyu/co-condenser-marco model. Is this the final model that you used to retrieve documents? Or do I need to train it on MS MARCO relevant query/passage pairs? Is it possible to provide a little bit more detail on how should I use your dense toolkit with this model?

Thank you in advance!

luyug commented 3 years ago

Hello,

Please take a look at the coCondenser fine-tuning tutorial. It should answer most of your questions.

We can leave this issue open for now in case you run into other problems.

Narabzad commented 3 years ago

Thank you for the great tutorial ! Just one issue that I have found is --passage_reps corpus/corpus/'*.pt' should be --passage_reps encoding/corpus/'*.pt' in this link https://github.com/texttron/tevatron/tree/main/examples/coCondenser-marco#index-search

luyug commented 3 years ago

Thanks for catching that!

Narabzad commented 2 years ago

Hi,

I was able to replicate the MRR@10 that you reported in the paper ( 0.38) but I was wondering what is the difference between the number that is reported on the leaderboard ( 0.44) vs 0.38? How do I replicate that? is it on a different set?

shunyuzh commented 2 years ago

Hi, @luyug

Thanks for your awesome work. I have similar question on NQ. Is it possible to give more details to reproduce the results (84.3=MRR@5) on NQ in the paper, just like the detailed MS MARCO tutorial demo?

Or if it need some time, could you tell me whether your SOTA model on NQ is trained with mined hard negatives or with both BM hard negatives and mined hard negatives as DPR github?

Thanks.

Yuan0320 commented 1 year ago

Hi @luyug,

Thanks for your great work! I also have the confusion about the difference between the reported result and leaderboard (0.38 vs. 0.44). Is there any update on this?

cadurosar commented 1 year ago

Also interested, from what I remember the main difference is that there's also a reranker applied, would it be possible to get the checkpoint of the reranker?

caiyinqiong commented 1 year ago

Hi, Thank you for your great work! I encounter some issues when I tried to reproduce the results on MARCO passage. I have referred to the aforementioned tutorial, but still cannot solve it (the problem seems to be in the step of mining hard negatives).

First, I run Fine-tuning Stage 1 with

CUDA_VISIBLE_DEVICES=3 python -m tevatron.driver.train \
  --output_dir model_msmarco_s1 \
  --model_name_or_path ../data/co-condenser-marco \
  --save_steps 20000 \
  --train_dir ../data/msmarco-passage/train_dir \
  --data_cache_dir ../data/msmarco-passage-train-cache \
  --fp16 \
  --dataloader_num_workers 2 \
  --per_device_train_batch_size 8 \
  --train_n_passages 8 \
  --learning_rate 5e-6 \
  --q_max_len 16 \
  --p_max_len 128 \
  --num_train_epochs 3 \
  --logging_steps 500 \

, and get MRR@10=0.3596, R@1000=0.9771. (Your reported results are MRR@10=0.357, R@1000=0.978).

Then, I run the hard negative mining with random sampling 30 negatives from the top-200 retrieval results of model_msmarco_s1 by modifying scripts/hn_mining.py (according to the parameters in build_train_hn.py).

Second, I run Fine-tuning Stage 2 with

CUDA_VISIBLE_DEVICES=3 python -m tevatron.driver.train \
  --output_dir model_msmarco_s2 \
  --model_name_or_path ../data/co-condenser-marco \
  --save_steps 20000 \
  --train_dir ../data/msmarco-passage/tain_dir_hn_dr_cocondenser200 \
  --data_cache_dir ../data/msmarco-passage-tain_hn_dr_cocondenser200-cache \
  --fp16 \
  --dataloader_num_workers 2 \
  --per_device_train_batch_size 8 \
  --train_n_passages 8 \
  --learning_rate 5e-6 \
  --q_max_len 16 \
  --p_max_len 128 \
  --num_train_epochs 2 \
  --logging_steps 500 \

, and get MRR@10=0.3657, R@1000=0.9761. (Your reported results are MRR@10=0.382, R@1000=0.984).

There are several possible issues that I would like to confirm:

  1. The training data for Fine-tuning Stage 2 only is hard negatives, having not been concatenated with BM25 negatives?
  2. The initial parameters are from co-condenser-marco, not the checkpoint of model_msmarco_s1?
  3. The setting of per_device_train_batch_size, train_n_passages, learning_rate, and num_train_epochs in Fine-tuning Stage 2 ?

Thank you in advance!