We have used range-based table for workers to minimize table data loading overhead.
But it might incur workload imbalance between workers, because its partition depends on hadoop library. For example LDA with pubmed dataset, imbalance is almost 50% (40K vs. 60K).
So this PR changes dolphin to use hash-based table for workers.
We have used range-based table for workers to minimize table data loading overhead. But it might incur workload imbalance between workers, because its partition depends on hadoop library. For example LDA with pubmed dataset, imbalance is almost 50% (40K vs. 60K).
So this PR changes dolphin to use hash-based table for workers.