mimno / Mallet

MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
https://mimno.github.io/Mallet/
Other
984 stars 344 forks source link

Iterate over input data, don't load into memory #170

Open jfelectron opened 5 years ago

jfelectron commented 5 years ago

The current implementation loads the entire input file into memory, leading to memory growth and exhaustion for large data sets. This is a POC for out of core data sets.

Notes:

Further improvements: