rjagerman / glint

Glint: High performance scala parameter server
MIT License
168 stars 62 forks source link

A question about glint #54

Open codlife opened 8 years ago

codlife commented 8 years ago

Hello Rolf! I have had a look about your code, what troubles me is how glint is interface with spark,i even don't see a line code related to spark. Best Wishes! Codelife

rjagerman commented 8 years ago

You're right that Glint is stand-alone and not necessarily interfaced with Spark. You could use it entirely without Spark. The documentation has a section that shows how Glint can easily be used within Spark: http://rjagerman.github.io/glint/gettingstarted/spark/

The main idea is that "BigVector" and "BigMatrix" objects are serializable and safe to be used within Spark closures. You can iterate over a dataset like you would in Spark but simultaneously use Glint to "pull" and "push" parts of a distributed model. The entire documentation is in need of an overhaul to make all this more clear.

I am still debating whether to integrate Glint more closely to Spark. One of the advantages is that we can run Glint within the Spark runtime (I have a proof-of-concept of this ready). This means we don't have to run the parameter servers as separate java processes. Anyone can just include Glint as a dependency and it will run automatically in their Spark cluster together with their code.

An example of Glint working together with Spark is GlintLDA, a state-of-the-art LDA algorithm that achieves Web-scale topic modeling beyond what was possible with mllib.

codlife commented 8 years ago

Thank you ! I will have a look about your doc, Thanks again!

codlife commented 8 years ago

Your current implement don't support cluster? how dou you store the bigMatrix if there are many servers? I think Flint can be a component of spark upon spark core.

cstur4 commented 8 years ago

We can setup parameter servers in spark application, and use glint as a component.