MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
I have this issue when importing the data to the format for LDA. I tried enlarge the MALLET_MEMORY=128G (the memory of my server is also 128G), but it still does not work.
My data contains 6,712,484 documents in one .txt file and its size is 3.07G
I sampled 100 documents to test the script for importing data, it works well. But keep popping this error message when importing my entire data.
Could you please help to figure out the problem? Really appreciate your help!!
The "bulk-load" function may be more efficient. But that size collection should definitely fit in 128G. I would suspect that the variable isn't being set in the right way for the shell script to find it.
I have this issue when importing the data to the format for LDA. I tried enlarge the MALLET_MEMORY=128G (the memory of my server is also 128G), but it still does not work.
My data contains 6,712,484 documents in one .txt file and its size is 3.07G I sampled 100 documents to test the script for importing data, it works well. But keep popping this error message when importing my entire data. Could you please help to figure out the problem? Really appreciate your help!!