nlpyang / BertSum

Code for paper Fine-tune BERT for Extractive Summarization
Apache License 2.0
1.46k stars 422 forks source link

Error in Step 4 of processing #98

Open MughilM opened 4 years ago

MughilM commented 4 years ago

Issue

Hello, Due to external constraints, I am unable to download the preprocessed data in Google Drive. However, I do have the raw .story files on hand, so I was going through the steps to preprocess the data myself. To start, I used 250 story files to make sure the steps work. Step 3 worked like a charm. However, when running Step 4, while it generates the json-line files fine, it results in an error the very end:

Traceback (most recent call last):
  File "preprocess.py", line 63, in <module>
    eval('data_builder.'+args.mode + '(args)')
  File "<string>", line 1, in <module>
  File "..../BertSum/src/prepro/data_builder.py", line 315, in format_to_lines
    with open(pt_file, 'w') as save:
FileNotFoundError: [Errno 2] No such file or directory: '../json_data/cnndm.train.0.json'

Steps to Reproduce

Follow steps 1-3 for preprocessing the data yourself in the README. Specifically, the command given in Step 4 gives the error at the end.

python preprocess.py -mode format_to_lines -raw_path RAW_PATH -save_path JSON_PATH -map_path MAP_PATH -lower

RAW_PATH is the directory containing tokenized files (../merged_stories_tokenized), JSON_PATH is the target directory to save the generated json files (../json_data/cnndm), MAP_PATH is the directory containing the urls files (../urls)

Thank you!

tschomacker commented 4 years ago

The error indicates that the directory could not be found. Please check that the folder../json_data exists before writing files to it.