sashafrey / topicmod

This project had been moved to https://github.com/bigartm/bigartm
Other
0 stars 0 forks source link

Processor causes high memory usage (inefficient construction of ProcessorOutput) #52

Closed sashafrey closed 10 years ago

sashafrey commented 10 years ago

The following code if very inefficient:

// Line 356-362 in Processor.h, in Processor::ThreadFunction() for (int token_index = 0; token_index < topic_model->token_size(); token_index++) { model_increment->add_token(topic_model->token(token_index)); FloatArray* counters = model_increment->add_token_increment(); for (int topic_index = 0; topic_index < topic_size; ++topic_index) { counters->add_value(0.0f); } }

Here we place all known tokens from topic model into the model_increment. Ideally, model_increment should only store tokens, used by the Batch currently handled by processor.

To implement the fix it might be sufficient to only change the code of Processor.h/Processor.cc.

sashafrey commented 10 years ago

Fixed https://github.com/sashafrey/topicmod/commit/191b402a26f0e33af1102744341b726dcb555ba0