Closed renepickhardt closed 9 years ago
From #37
for big data sets the aggregator is not running through. currently the program is not stopping but it kind of hangs. i guess the jvm can't allocate memory for bigger hashmaps and java tries to fill up the hashmap and resolves collisions.
there are several fixes (if memory in the aggregator is critical decrease the number of paralel tasks) change the workflow while aggregating towards aggregating several smaller sorted files and aggregate the final version with an n way aggregation merge sort.
I believe this to be fixed. If you are to find cases of OutOfMemory Errors again, open a new bug please.
This is most certainly due to the fact that the hashmaps are running out of memory. so here are some possible fixes:
1.) make an own index for POS and use this instead of the wordindex 2.) make small sorted file chunks and an external n-way merge 3.) increase maxCountDivider (painful: yields recalculation of sequencer)
regarding one: another option here would be to switch from strings to int tokens since WordIndex.rank() would also be much faster.