We don't have enough data yet to say for sure, but it is likely that Merger can be a bottleneck in some important scenarios. For example:
set inner_iterations_count = 1, or
have 16 concurrent processors, or
merging increments on master_component in network modus operandi
We should design and implement an option to run multiple merger threads per instance.
In parallel we should research what are the current bottlenecks in the merger.
Is it expensive to always lookup each token in the token_to_tokenid map?
Is it expensive to always send the entire token-topic matrix from nodes to the master? Most likely yes, and we should carefully produce modelincrement to send it from nodes to the master.
Is it now time to look at sparsity of the Phi matrix? This should improve the throughput of Merger component.
We don't have enough data yet to say for sure, but it is likely that Merger can be a bottleneck in some important scenarios. For example:
In parallel we should research what are the current bottlenecks in the merger.