Closed sashafrey closed 9 years ago
I think the issue is well understood here.
The throughput of BigARTM scales very well as the number of cores increases. However starting and finishing the iteration adds up quite a bit of overhead: (1) it takes time for data_loader to populate processor input queue with data, (2) batches are usually quite large, so it takes some time between finish time of the first and the last processor. During this time half of CPU resources are utilized (on average). (3) It takes some time for Merger to finish its tasks.
All this is quite OK to have. It will be reasonable to measure BigARTM peak throughput - and it really scales linearly with the number of processors.
Experiments on Nytimes show that BigARTM perfromance scales sublinearly when the number of cores is aprox. 8 or higher. We should understand where the bottleneck is, and fix it if possible.