nlpyang / PreSumm

code for EMNLP 2019 paper Text Summarization with Pretrained Encoders
MIT License
1.29k stars 465 forks source link

processed data `bert_data_cnndm_final.zip` not consistent with the code #213

Open hedonihilist opened 3 years ago

hedonihilist commented 3 years ago

After extracting bert_data_cnndm_final.zip, I got files named like cnndm.train.100.bert.pt, which is not recognized by the following code

https://github.com/nlpyang/PreSumm/blob/70b810e0f06d179022958dd35c1a3385fe87f28c/src/models/data_loader.py#L84

Changing the code like this can fix the issue:

pts = sorted(glob.glob(os.path.join(args.bert_data_path, 'cnndm.' + corpus_type + '.[0-9]*.pt'))
ghost commented 3 years ago

It misses a ). The fix is rather pts = sorted(glob.glob(os.path.join(args.bert_data_path, 'cnndm.' + corpus_type + '.[0-9]*.pt')))