zhangyuc / splash

Splash Project for parallel stochastic learning
94 stars 21 forks source link

Some questions about spalsh #5

Open codlife opened 7 years ago

codlife commented 7 years ago

Hi @zhangyuc: I have seen your paper about splash. I want a question: Is the experiment "in Local solutions with unit-weight data" the same with spark mllib currently imp? BTW,According to my experiment, the memory usage is more than spark mllib SGD. Thankyou!

zhangyuc commented 7 years ago

It is faster because it schedules the SGD differently from MLlib. Given a (distributed) mini-batch, MLlib computes the gradient within each machine, then reduces the gradient before taking one descent step. Splash runs SGD within each machine independently and reduces the update after the mini-batch is processed. Thus it could perform hundreds of updates on this mini-batch while MLlib performs only one.

codlife commented 7 years ago

yes, Splash runs SGD within each machine independenly? that mean splash run many SGD and then merge them to one? If there are no other issues( such as memory) maybe we can use this as mllib standard imp.