Closed pedro-walter closed 7 years ago
I am also affected by this bug...
@madewild could you post a code snippet to reproduce?
Unfortunately the text fed to the summarizer is confidential, but my guess is that the error was triggered by an unusually high repetition of some sentences... I also notice now that the error raised was not exactly the same: ValueError: k must be less than rank(A)-1, k=1
@tmylk The reason for this failure looks like the number of nodes in the graph that is used to calculate the pagerank of the corpus graph, after removing unreachable nodes the graph is left with only 2 nodes and hence it builds a matrix of shape 2 * 2 (for which scipy.sparse.linalg.eigs() will fail for k=1). We should probably raise an error if number of nodes (after removing unreachable nodes) goes below 3.
@MridulS could you submit a pr for this?
@tmylk What kind of error should I raise?
The error should say "Please add more sentences to the text. The number of reachable nodes is below 3"
Hi I worked on this issue. I have sent out a pull request for the same. Please review.
Hello, I think that the problem is still open. I replicated this error with the document 1403 from the Hulth2003 dataset):
from gensim.summarization import keywords
print(keywords('IT: Utilities A look at five utilities to make your PCs more, efficient, effective, and efficacious'))
Traceback (most recent call last):
File "\<stdin>", line 1, in
Looking to the document, make sense say that the possible problems are the terms frequencies! All the terms have frequency equal 1.
Hmm, that's not good, looks like a bug.
Can you suggest a fix @vitordouzi ?
@piskvorky, no, I don't! sorry! Maybe this TODO in the pagerank_weighted.py file can help.
File "/gensim/summarization/pagerank_weighted.py", line 24, in pagerank_weighted vals, vecs = eigs(pagerank_matrix.T, k=1) # TODO raise an error if matrix has complex eigenvectors?
What exactly are the complex eigenvectors?
Hello everyone. I started investigating this issue and basically, this is the same one as @MridulS described, but in different function:
The reason for this failure looks like the number of nodes in the graph that is used to calculate the pagerank of the corpus graph, after removing unreachable nodes the graph is left with only 2 nodes and hence it builds a matrix of shape 2 * 2 (for which scipy.sparse.linalg.eigs() will fail for k=1). We should probably raise an error if number of nodes (after removing unreachable nodes) goes below 3.
On text, given by @vitordouzi we end up with graph:
('effect', 'effici'), ('effici', 'effect'), ('effici', 'effici'), ('effect', 'effect')
which ends in 2x2 matrix and pagerank fails.
But I'm not sure how to fix this. @vitordouzi, @menshikh-iv any ideas on the desired outcome? An exception this time doesn't feel right. Maybe set some predefined scores instead of running pagerank? Or maybe add special case to pagerank?
Anyway, some notes about pagerank_weighted
:
pagerank_matrix
is positive and Perron–Frobenius theorem)eigs()
is used instead of eig()
on a dense matrix. It was discussed in #441, #438. I'd like to add a comment about this.About (1), (2) @xelez - need to handle special case, the comment from (3) should be useful too.
Hi,
I've received the following error when trying to summarize the body of this news article:
https://www.theguardian.com/media/2016/jun/19/sun-times-brexit-in-out-shake-it-all-about
The error follows:
File "/home/apps/comment_parser/venv/local/lib/python2.7/site-packages/gensim/summarization/summarizer.py", line 202, in summarize most_important_docs = summarize_corpus(corpus, ratio=ratio if word_count is None else 1) File "/home/apps/comment_parser/venv/local/lib/python2.7/site-packages/gensim/summarization/summarizer.py", line 161, in summarize_corpus pagerank_scores = _pagerank(graph) File "/home/apps/comment_parser/venv/local/lib/python2.7/site-packages/gensim/summarization/pagerank_weighted.py", line 24, in pagerank_weighted vals, vecs = eigs(pagerank_matrix.T, k=1) # TODO raise an error if matrix has complex eigenvectors? File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1271, in eigs ncv, v0, maxiter, which, tol) File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 685, in init raise ValueError("k must be less than ndim(A)-1, k=%d" % k) ValueError: k must be less than ndim(A)-1, k=1
Regards,