Closed nateraw closed 5 years ago
Should remove instantiation of doc_lengths
in nlppipe.py to the get_skipgrams
function, appending on document lengths only if we getting skipgrams from that document. That way, we won't ever have an issue with the sizes being mismatched.
doc_lengths
is used when visualizing topics using ldavis. Currently, you'll get an error when trying to visualize the topics saying that its length does not equalnum_docs
.This stems from the fact that we purge documents in
nlppipe.py
if they are too short to create skipgrams. Document lengths corresponding to purged documents are never purged, so you are left with the original length of the input texts instead of the actual number of documents we processed.