nlpyang / BertSum

Code for paper Fine-tune BERT for Extractive Summarization
Apache License 2.0
1.46k stars 422 forks source link

Help needed when processing my own dataset for testing. #122

Open namln2k opened 2 years ago

namln2k commented 2 years ago

Hi guys, I've completed the training and validations and tests with the preprocessed dataset a few days ago. However my mentor ordered me to test this model on the DUC-2004 dataset. I'm not sure how to process the dataset just for testing, so I followed the guides in README.md and stuck at this step: Step 4. Format to Simpler Json Files python preprocess.py -mode format_to_lines -raw_path RAW_PATH -save_path JSON_PATH -map_path MAP_PATH -lower Below is part of the output of Step 3 and the command I used in step 4 image As you can see, no output was printed in step 4. I doubt that's because of the /urls folder. I don't know how to process the files in it to match with my dataset. Moreover, my merged_story_tokenized files look like this image Can someone please help me? Or show me the way to process the data, just for testing?