piskvorky / gensim

Topic Modelling for Humans
https://radimrehurek.com/gensim
GNU Lesser General Public License v2.1
15.7k stars 4.38k forks source link

Error when most_important_docs in summarizer.py is None #1597

Closed shengyang998 closed 7 years ago

shengyang998 commented 7 years ago
In [0]: gensim.__version__
Out [0]: '2.3.0'

Description: I was working on a set of Chinese sentences. And when I call the function gensim.summarization.summarize().The Error below was occurred:

  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/gensim/summarization/summarizer.py", line 215, in summarize
    extracted_sentences = _extract_important_sentences(sentences, corpus, most_important_docs, word_count)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/gensim/summarization/summarizer.py", line 114, in _extract_important_sentences
    important_sentences = _get_important_sentences(sentences, corpus, important_docs)
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/gensim/summarization/summarizer.py", line 89, in _get_important_sentences
    return [sentences_by_corpus[tuple(important_doc)] for important_doc in important_docs]
TypeError: 'NoneType' object is not iterable

It seems that the important_doc is None, and NoneType cannot be iterated. Well, I didn't learn so much of TextRank Algorithm, and I am trying to go on to work. Maybe someone can tell what is happening?

PS: Sorry that I could not afford the test case I was using, for it is full of Chinese name. (If someone ask me privately, maybe i could.) Anyway, there is a bug in it. For some reason the most_important_docs at line 212, summarizer.py is None. This situation should be handled properly. I suppose that summarize() should return None or raise some other Error for debugging when most_important_docs is None. Or even better, optimize the implementation of TextRank Algorithm, which is fully out of my ability for now...

>>> s = '`a string full of different Chinese name, with the number of more than 1 thousand.`'
>>> import gensim.summarization as gsum
>>> gsum.summarize(s)
zsef123 commented 7 years ago

check https://github.com/RaRe-Technologies/gensim/issues/1531

shengyang998 commented 7 years ago

OK, thank you! @zsef123