shashiongithub / sidenet

SideNet: Neural Extractive Summarization with Side Information
BSD 3-Clause "New" or "Revised" License
57 stars 10 forks source link

Training data query #3

Open riddhirdasani opened 6 years ago

riddhirdasani commented 6 years ago

Hello Shashi, I am unable to understand why do we need these three directories and what role do they play in training? 1. preprocessed_data_directory 2. gold_summary_directory 3. doc_sentence_directory. Can you provide little more insights of these three.? When I was trying , one epoch has already finished and then this error appeared.

shashiongithub commented 6 years ago

For training you only need (1). (2) is used to estimate ROUGE scores. (1) and (3) is used during decoding. What error do you get after the first epoch?

riddhirdasani commented 6 years ago

File "/home/beast/riddhi/main/data_utils.py", line 116, in process_predictions_rankedtopthree docsents = open(sent_filename).readlines() FileNotFoundError: [Errno 2] No such file or directory: '/home/beast/riddhi/main/JP_herman/cnn/validation-sent/8f6b39e6c63b0ae3546cdfeb8209693f292b060e.summary.final.org_sents'

sent_filename which got generated in line 115 by using FLAGS.doc_sentence_directory, cannot be opened in line 116. I just presumed and made directory , what exactly should be there in this directory?

shashiongithub commented 6 years ago

It should point to 3) doc_sentence_directory directory.

riddhirdasani commented 6 years ago

Yes and what should be there, from where can I get it?

shashiongithub commented 6 years ago

Please check: https://github.com/shashiongithub/sidenet/issues/2