jingtaozhan / DRhard

SIGIR'21: Optimizing DR with hard negatives and achieving SOTA first-stage retrieval performance on TREC DL Track.
BSD 3-Clause "New" or "Revised" License
125 stars 14 forks source link

Reproduce results #14

Closed laos1984 closed 3 years ago

laos1984 commented 3 years ago

Hi Jingtao,

I try to reproduce the results showing in the README. The models are downloaded from google drive. For the transformers version, preprocessing is 2.8.0 and for inference is 4.8.2.

I ran the following commands: python ./star/inference.py --data_type passage --max_doc_length 256 --mode dev
python ./msmarco_eval.py ./data/passage/preprocess/dev-qrel.tsv ./data/passage/evaluate/star/dev.rank.tsv

And I got the following results: Eval Started ##################### MRR @10: 0.010382669304589082 QueriesRanked: 6980 #####################

Could you help to figure out what I did wrong? Thanks!

jingtaozhan commented 3 years ago

I'm not sure why this happens. Firstly, maybe you can try version 3.4.0 during inference. Secondly, I suggest checking whether the generated dev-qrel.tsv is correct. You can check it by using it to evaluate the provided STAR rank file. After downloading it, you need to convert the qids and pids to the preprocessed qoffsets and poffsets. It is a little bit tricky, but you can refer to cvt_back.py, which converts in an opposite direction (offsets to ids). Then you can run python ./msmarco_eval.py ./data/passage/preprocess/dev-qrel.tsv convt_download_dev.rank.tsv and see whether MRR@10 is 0.340. Happy to help you :)

jingtaozhan commented 3 years ago

No activity. Closing.