intel-machine-learning / DistML

DistML provide a supplement to mllib to support model-parallel on Spark
Other
167 stars 75 forks source link

DistML (Distributed Machine Learning platform)

DistML is a machine learning tool which allows traing very large models on Spark, it's fully compatible with Spark (tested on 1.2 or above).

Reference paper: Large Scale Distributed Deep Networks

Runtime view:

DistML provides several algorithms (LR, LDA, Word2Vec, ALS) to demonstrate its scalabilites, however, you may need to write your own algorithms based on DistML APIs(Model, Session, Matrix, DataStore...), generally, it's simple to extend existed algorithms to DistML, here we take LR as an example: How to implement logistic regression on DistML.

User Guide

  1. Download and build DistML.
  2. Typical options.
  3. Run Sample - LR.
  4. Run Sample - MLR.
  5. Run Sample - LDA.
  6. Run Sample - Word2Vec.
  7. Run Sample - ALS.
  8. Benchmarks.
  9. FAQ.

API Document

  1. Source Tree.
  2. DistML API.

Contributors

He Yunlong (Intel)
Sun Yongjie (Intel)
Liu Lantao (Intern, Graduated)
Hao Ruixiang (Intern, Graduated)