Closed nfrumkin closed 8 months ago
Let me try to rerun the code first. will let you know asap.
Hi @MXueguang , thanks so much for the swift response! I actually found a solution/workaround:
It seems that checking out commit 0e93945 solved the problem! I now get:
ndcg_cut_10 all 0.7682
I am unable to reproduce the DL19 NDCG@10 reported in the Rankllama README. I have followed the README instructions to a tee (excluding the transformers version, mine is 4.37.0). Here is my result:
ndcg_cut_10 all 0.7489
However, the reported result is 0.7568. I am using the same GPU, same pre-trained model/tokenizer, and the same repllama source file (downloaded from DropBox). Are there any other variables I missed for reproducing? Any help is appreciated!
The script I used is the same as in the README (copied below):
python prepare_rerank_file.py \ --query_data_name Tevatron/msmarco-passage \ --query_data_split dl19 \ --corpus_data_name Tevatron/msmarco-passage-corpus \ --retrieval_results run.repllama.psg.dl19.txt \ --output_path rerank_input.repllama.psg.dl19.jsonl \ --depth 200
CUDA_VISIBLE_DEVICES=0 python reranker_inference.py \ --output_dir=temp \ --model_name_or_path castorini/rankllama-v1-7b-lora-passage \ --tokenizer_name meta-llama/Llama-2-7b-hf \ --encode_in_path rerank_input.repllama.psg.dl19.jsonl \ --fp16 \ --per_device_eval_batch_size 64 \ --q_max_len 32 \ --p_max_len 164 \ --dataset_name json \ --encoded_save_path run.rankllama.psg.dl19.txt
python -m tevatron.utils.format.convert_result_to_trec \ --input run.rankllama.psg.dl19.txt \ --output run.rankllama.psg.dl19.trec
python -m pyserini.eval.trec_eval -c -m ndcg_cut.10 dl19-passage run.rankllama.psg.dl19.trec