Distributed machine learning algorithms are usually implemented in Spark MLLib in this way
Driver broadcasts weights to tasks distributed across machines
Tasks update the local weights on a batch of data
Driver gathers and aggregates(treeAggregate) all the weights from tasks
Loop
Driver can easily become the bottleneck and be shot down by a big model. That's where parameter servers comes in. Despite early investigation and prototype from the community, Spark hasn't yet supported parameter servers or event long running services.
Unlike MLLib, this FM implementation is built with Glint, a high performance parameter server using Akka.
Although it has shown better performance and scalability than one MLLib way implementation, there are remaining challenges (Akka frame size, backpressure) and future improvements can be drawn from DMLC ps-lite and difacto
Slides and project
This is a Spark Summit EU 2016 talk by Nick Pentreath from IBM.
Notes
Distributed machine learning algorithms are usually implemented in Spark MLLib in this way
treeAggregate
) all the weights from tasksDriver can easily become the bottleneck and be shot down by a big model. That's where parameter servers comes in. Despite early investigation and prototype from the community, Spark hasn't yet supported parameter servers or event long running services.
Unlike MLLib, this FM implementation is built with Glint, a high performance parameter server using Akka.
Although it has shown better performance and scalability than one MLLib way implementation, there are remaining challenges (Akka frame size, backpressure) and future improvements can be drawn from DMLC ps-lite and difacto