Open codlife opened 8 years ago
You're right that Glint is stand-alone and not necessarily interfaced with Spark. You could use it entirely without Spark. The documentation has a section that shows how Glint can easily be used within Spark: http://rjagerman.github.io/glint/gettingstarted/spark/
The main idea is that "BigVector" and "BigMatrix" objects are serializable and safe to be used within Spark closures. You can iterate over a dataset like you would in Spark but simultaneously use Glint to "pull" and "push" parts of a distributed model. The entire documentation is in need of an overhaul to make all this more clear.
I am still debating whether to integrate Glint more closely to Spark. One of the advantages is that we can run Glint within the Spark runtime (I have a proof-of-concept of this ready). This means we don't have to run the parameter servers as separate java processes. Anyone can just include Glint as a dependency and it will run automatically in their Spark cluster together with their code.
An example of Glint working together with Spark is GlintLDA, a state-of-the-art LDA algorithm that achieves Web-scale topic modeling beyond what was possible with mllib.
Thank you ! I will have a look about your doc, Thanks again!
Your current implement don't support cluster? how dou you store the bigMatrix if there are many servers? I think Flint can be a component of spark upon spark core.
We can setup parameter servers in spark application, and use glint as a component.
Hello Rolf! I have had a look about your code, what troubles me is how glint is interface with spark,i even don't see a line code related to spark. Best Wishes! Codelife