lintool / twitter-tools

Twitter Tools
twittertools.cc
218 stars 100 forks source link

Memory usage in IndexStatuses #29

Open isoboroff opened 11 years ago

isoboroff commented 11 years ago

IndexStatuses can OOM in the last stage, when it calls write.forceMerge(1). An OOM in this case destroys the index, perhaps this is due to the actions in the finally{} clause?

This should be more robust. stewdhcs suggested a custom merge policy in issue https://github.com/lintool/twitter-tools/issues/17.

lintool commented 11 years ago

The final forceMerge is to merge all single index segments into a single one for better retrieval performance (this used to be the "optimize" method in earlier versions of Lucene). I think the simplest solution is to have this as a command line parameter (e.g., -optimize) that's not set by default.