Open codlife opened 7 years ago
It is faster because it schedules the SGD differently from MLlib. Given a (distributed) mini-batch, MLlib computes the gradient within each machine, then reduces the gradient before taking one descent step. Splash runs SGD within each machine independently and reduces the update after the mini-batch is processed. Thus it could perform hundreds of updates on this mini-batch while MLlib performs only one.
yes, Splash runs SGD within each machine independenly? that mean splash run many SGD and then merge them to one? If there are no other issues( such as memory) maybe we can use this as mllib standard imp.
Hi @zhangyuc: I have seen your paper about splash. I want a question: Is the experiment "in Local solutions with unit-weight data" the same with spark mllib currently imp? BTW,According to my experiment, the memory usage is more than spark mllib SGD. Thankyou!