ielab / APR

The repo for reproducing the ANCE-PRF training.
8 stars 0 forks source link

KeyError: '1' when I ran ""make_train_from_ranking.py" #2

Open XY2323819551 opened 2 years ago

XY2323819551 commented 2 years ago

Hello, thanks for your amazing work, I really want to reproduce it. However, I met an issue when I run the code, could you help me?

command line: python make_train_from_ranking.py --ranking-file /home/zhangxy/QA/ANCE-PRF/pyserini/runs/run.msmarco-passage.ance.bf.tsv --model-type ANCE --query-file /home/zhangxy/QA/ANCE-PRF-main/data/marco_raw_data/queries.train.tsv --collection-file ./data/msmarco_passage/collection/collection.tsv --pair-file /home/zhangxy/QA/ANCE-PRF-main/data/marco_raw_data/qrels.train.tsv --output data/hard/negative.result --encoder /home/zhangxy/QA/pyserini_for_ance-prf/pyserini/encoders/ance-msmarco-passage

processing: Load Query: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 808731/808731 [00:00<00:00, 1140903.16it/s] Load Collection: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 8841823/8841823 [00:16<00:00, 521248.96it/s] Load Q-D Pair: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 532761/532761 [00:00<00:00, 989247.88it/s] Load Ranking: 0%| | 0/808731000 [00:00<?, ?it/s] Traceback (most recent call last): File "make_train_from_ranking.py", line 94, in rankings, topk = read_ranking(args.ranking_file, pair, args.prf_k, args.from_top) File "make_train_from_ranking.py", line 35, in read_ranking targets = pair[qid].keys() KeyError: '1'

hanglics commented 2 years ago

Hi sorry about this, you need this file for training. train_query_passage_pair.tsv for the --pair-file arg.

XY2323819551 commented 2 years ago

/home/zhangxy/QA/ANCE-PRF-main/data/marco_raw_data/qrels.train.tsv

I tried the new pair file but failed. I noticed that the "queries.train.tsv" for the "--query-file" arg I used has 808731 examples, however, "train_query_passage_pair.tsv" for the "--pair-file" has 532751 examples, which is less than "queries.train.tsv". I guess this issue was caused by the mismatches between two files. So, is it convenient for you to provide me the file with the "--query-file " arg? Thank you very much!

XY2323819551 commented 2 years ago

Hi sorry about this, you need this file for training. train_query_passage_pair.tsv for the --pair-file arg.

I had this problem before in this issue, I mistakenly thought I found the correct file, but it seems I didn't.

hanglics commented 2 years ago

For the --query-file arg, please use this file train_query_judged.tsv