Order inconsistency of output candidate file with original test.json when testing bertSum Extractive

nlpyang / BertSum

Code for paper Fine-tune BERT for Extractive Summarization

Apache License 2.0

1.46k stars 422 forks source link

Order inconsistency of output candidate file with original test.json when testing bertSum Extractive #129

Open cece00 opened 2 years ago

cece00 commented 2 years ago

Under "test" mode, there will be two files output: xxx.candidate and xxx.gold. The texts in above two files are in the same order, but do not consistent with the original test.json. I have checked that "shuffle=False" in dataloader. So where is wrong? Is there anyone who has encountered the same problem? Can anyone help!?

ashokurlana commented 2 years ago

@cece00 Modify the Line 89 src/model/data_loader.py The following code fixed the similar issue for me

def atoi(text): return int(text) if text.isdigit() else text

def natural_keys(text): return [ atoi(c) for c in re.split(r'(\d+)', text) ]

pts = sorted(glob.glob(args.bert_data_path + 'cnndm.' + corpus_type + '.[0-9]*.bert.pt')) pts.sort(key=natural_keys)