Closed helloSimo closed 1 year ago
Hi @helloSimo
please add --passage_field_separator sep
while encoding the corpus.
The original cocondenser-retriever using [SEP] between title and passage. In tevatron, the default separator is ' '.
sorry i miss it in the example document.
Hi, I just follow example_msmarco.md and using "Luyu/co-condenser-marco-retriever". my reproduced result is 37, but the result in paper is 38.2. May I ask what I need to do to obtain the results in the paper.
Here are my bash:
CUDA_VISIBLE_DEVICES=0 python -m tevatron.driver.encode \ --output_dir=temp \ --model_name_or_path Luyu/co-condenser-marco-retriever \ --fp16 \ --per_device_eval_batch_size 1024 \ --p_max_len 128 \ --dataset_name Tevatron/msmarco-passage-corpus \ --encoded_save_path c/corpus_emb.pkl \ --encode_num_shard 1 \ --encode_shard_index 0
CUDA_VISIBLE_DEVICES=0 python -m tevatron.driver.encode \ --output_dir=temp \ --model_name_or_path Luyu/co-condenser-marco-retriever \ --fp16 \ --per_device_eval_batch_size 1024 \ --dataset_name Tevatron/msmarco-passage/dev \ --encoded_save_path temp_out/query_emb.pkl \ --q_max_len 32 \ --encode_is_qry
python -m tevatron.faiss_retriever \ --query_reps temp_out/query_emb.pkl \ --passage_reps temp_out/corpus_emb.pkl \ --depth 100 \ --batch_size -1 \ --save_text \ --save_ranking_to temp_out/rank.txt
python -m tevatron.utils.format.convert_result_to_marco \ --input temp_out/rank.txt \ --output temp_out/rank.txt.marco
python -m pyserini.eval.msmarco_passage_eval msmarco-passage-dev-subset temp_out/rank.txt.marco