BigARTM performance doesn't scale linearly at 16 cores

I think the issue is well understood here.

The throughput of BigARTM scales very well as the number of cores increases. However starting and finishing the iteration adds up quite a bit of overhead: (1) it takes time for data_loader to populate processor input queue with data, (2) batches are usually quite large, so it takes some time between finish time of the first and the last processor. During this time half of CPU resources are utilized (on average). (3) It takes some time for Merger to finish its tasks.

All this is quite OK to have. It will be reasonable to measure BigARTM peak throughput - and it really scales linearly with the number of processors.

sashafrey / topicmod

BigARTM performance doesn't scale linearly at 16 cores #85