Open credo99 opened 5 years ago
Apparently it's a well-known memory leak from Python Tfidfvectorizer fit_transform method when a sparse array is converted to a dense array leading on huge text corpus to filling 100s of GBytes of RAM. Most of the people recommend eliminating the toarray() conversion since most of the classifiers (except Gaussian) accept also sparse arrays as fitting. When I eliminated the toarray() method there was no more an memory error but the features.shape output increased brutally to (364203, 698312) from what the original notebook describes (4569, 12633). I can't figure out why, if the elimination of the toarray() is wrong then the code must be modified since the memory error is strange (on my 64 GB laptop I have almost 30 GB free RAM in the task manager but the Jupyter notebook complains everytime about memory error)
Sorry, I wrongfully closed the issue
I'm also facing the issue. What's the solution?
MemoryError Traceback (most recent call last)
MemoryError Traceback (most recent call last)