Rankllama: Reproducing DL19 Inference

nfrumkin commented 8 months ago

I am unable to reproduce the DL19 NDCG@10 reported in the Rankllama README. I have followed the README instructions to a tee (excluding the transformers version, mine is 4.37.0). Here is my result:

ndcg_cut_10 all 0.7489

However, the reported result is 0.7568. I am using the same GPU, same pre-trained model/tokenizer, and the same repllama source file (downloaded from DropBox). Are there any other variables I missed for reproducing? Any help is appreciated!

The script I used is the same as in the README (copied below):

python prepare_rerank_file.py \ --query_data_name Tevatron/msmarco-passage \ --query_data_split dl19 \ --corpus_data_name Tevatron/msmarco-passage-corpus \ --retrieval_results run.repllama.psg.dl19.txt \ --output_path rerank_input.repllama.psg.dl19.jsonl \ --depth 200

CUDA_VISIBLE_DEVICES=0 python reranker_inference.py \ --output_dir=temp \ --model_name_or_path castorini/rankllama-v1-7b-lora-passage \ --tokenizer_name meta-llama/Llama-2-7b-hf \ --encode_in_path rerank_input.repllama.psg.dl19.jsonl \ --fp16 \ --per_device_eval_batch_size 64 \ --q_max_len 32 \ --p_max_len 164 \ --dataset_name json \ --encoded_save_path run.rankllama.psg.dl19.txt

python -m tevatron.utils.format.convert_result_to_trec \ --input run.rankllama.psg.dl19.txt \ --output run.rankllama.psg.dl19.trec

python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-passage run.rankllama.psg.dl19.trec

MXueguang commented 8 months ago

Let me try to rerun the code first. will let you know asap.

nfrumkin commented 8 months ago

Hi @MXueguang , thanks so much for the swift response! I actually found a solution/workaround:

It seems that checking out commit 0e93945 solved the problem! I now get:

ndcg_cut_10 all 0.7682

texttron / tevatron

Rankllama: Reproducing DL19 Inference #112