Processor causes high memory usage (inefficient construction of ProcessorOutput)

The following code if very inefficient:

// Line 356-362 in Processor.h, in Processor::ThreadFunction() for (int token_index = 0; token_index < topic_model->token_size(); token_index++) { model_increment->add_token(topic_model->token(token_index)); FloatArray* counters = model_increment->add_token_increment(); for (int topic_index = 0; topic_index < topic_size; ++topic_index) { counters->add_value(0.0f); } }

Here we place all known tokens from topic model into the model_increment. Ideally, model_increment should only store tokens, used by the Batch currently handled by processor.

To implement the fix it might be sufficient to only change the code of Processor.h/Processor.cc.

sashafrey / topicmod

Processor causes high memory usage (inefficient construction of ProcessorOutput) #52