DistML (Distributed Machine Learning platform)

DistML is a machine learning tool which allows traing very large models on Spark, it's fully compatible with Spark (tested on 1.2 or above).

Reference paper: Large Scale Distributed Deep Networks

Runtime view:

DistML provides several algorithms (LR, LDA, Word2Vec, ALS) to demonstrate its scalabilites, however, you may need to write your own algorithms based on DistML APIs(Model, Session, Matrix, DataStore...), generally, it's simple to extend existed algorithms to DistML, here we take LR as an example: How to implement logistic regression on DistML.

User Guide

Download and build DistML.
Typical options.
Run Sample - LR.
Run Sample - MLR.
Run Sample - LDA.
Run Sample - Word2Vec.
Run Sample - ALS.
Benchmarks.
FAQ.

API Document

Source Tree.
DistML API.

Contributors

He Yunlong (Intel)
Sun Yongjie (Intel)
Liu Lantao (Intern, Graduated)
Hao Ruixiang (Intern, Graduated)

intel-machine-learning / DistML

readme

DistML (Distributed Machine Learning platform)

User Guide

API Document

Contributors