snuspl / cruise

Cruise: A Distributed Machine Learning Framework with Automatic System Configuration
Apache License 2.0
26 stars 2 forks source link

[MINOR] Use hash-based table for workers #1242

Closed wynot12 closed 7 years ago

wynot12 commented 7 years ago

We have used range-based table for workers to minimize table data loading overhead. But it might incur workload imbalance between workers, because its partition depends on hadoop library. For example LDA with pubmed dataset, imbalance is almost 50% (40K vs. 60K).

So this PR changes dolphin to use hash-based table for workers.

wynot12 commented 7 years ago

It looks that this way incurs too much memory pressure.. LDA with pubmed dataset fails in AWS cluster due to OOM.

Give me some time to check the issue.

yunseong commented 7 years ago

Let's close this PR until we don't have any OOM issues.