Open simonm3 opened 4 years ago
Hi I just tried it out and found the error mentioned above.
>>> termdoc = docterm.T.tocsc()
>>> type(termdoc)
<class 'scipy.sparse.csc.csc_matrix'>
I'm working on fixing the documentation: Should I replace scipy.sparse.csc
-> gensim.matutils.Sparse2Corpus
in the gensim.models.ldamodel
docstrings?
The result of Sparse2Corpus is just a standard gensim corpus, it's not a special type.
IMO we should get rid of the documentation that claims LDA accepts CSC matrices on input. Gensim accepts only standard streamed corpora = iterable of sparse vectors, where each sparse vector is a list (feature_id, feature_weight)
2-tuples.
We can mention that if you have a CSC in-memory matrix, you may convert it to a streamed corpus with the help of gensim.matutils.Sparse2Corpus
.
@FyzHsn would you be able to fix this? Thanks.
@piskvorky I'll fix it. Thanks for the feedback.
LDAMulticore doc has the same issue.
Docs say a scipy sparse csc matrix can be used but it can't. It works with sparse3corpus. Here is example:
There are two different error messages depending on length of texts I think. Here are both:
ypeError Traceback (most recent call last)