Could not find Document/cluster

vishnu45 / NLP-Extractive-NEWS-summarization-using-MMR

A simple python implementation of the Maximal Marginal Relevance (MMR) baseline system for text summarization.

67 stars 40 forks source link

Could not find Document/cluster #1

Closed JafferWilson closed 7 years ago

JafferWilson commented 7 years ago

Hi, I am trying to run the summarizer. I could not see the Document/cluster in your repository and while running I get error:

Running MMR Summarizer for files in folder:  DUC2003
Traceback (most recent call last):
  File "mmr_summarizer.py", line 303, in <module>
    sentences = sentences + processFile(curr_folder + "/" + file)
  File "mmr_summarizer.py", line 25, in processFile
    text_1 = re.sub("<TEXT>\n","",text_1.group(0))
AttributeError: 'NoneType' object has no attribute 'group'

Let me know what I need to do to make the code run properly. Thank you.

vishnu45 commented 7 years ago

I hadn't included the DUC2004 documents in my repo. These can be downloaded from nlpir.nist.gov website once you get the permission from them to reuse the document cluster they maintain. I had mentioned this in the readme section. I have provided the link to the DUC2004 dataset itself.

JafferWilson commented 7 years ago

Yes @vishnu45 I have seen the links. I have applied. Still there is no reply. Hence, I thought I can get it from you, so that I can try the repository. Do let me know if you have any.

vishnu45 commented 7 years ago

Since it required permission from NIST I hadn't uploaded them at the time. I don't have the documents any longer. However, you could find the documents at https://github.com/syedhope/Text_Summarization-MMR_and_LexRank .

JafferWilson commented 7 years ago

Thank you @vishnu45 for your help. :)