petuum / autodist

Simple Distributed Deep Learning on TensorFlow
https://petuum.github.io/autodist
Apache License 2.0
134 stars 26 forks source link
aws cloud deep-learning distributed-computing distributed-systems machine-learning on-premise tensorflow

pipeline status coverage report pypi version

Documentation | Examples

AutoDist is a distributed deep learning training engine for TensorFlow. AutoDist provides a user-friendly interface to distribute the training of a wide variety deep learning models across many GPUs with scalability and minimal code change.

Introduction

Different from specialized distributed ML systems, AutoDist is created to speed up a broad range of DL models with excellent all-round performance. AutoDist achieves this goal by:

Besides all these advanced features, AutoDist is designed to isolate the sophistication of distributed systems from ML prototyping and exposes a simple API that makes it easy to use and switch between different distributed ML techniques for users of all levels.

For a closer look at the performance, please refer to our doc.

Using AutoDist

Installation:

pip install autodist

Modifying existing TensorFlow code to use AutoDist is easy:

import tensorflow as tf
from autodist import AutoDist

ad = AutoDist(resource_spec_file="resource_spec.yml")

with tf.Graph().as_default(), ad.scope():
    ########################################################
    # Build your (single-device) model here,
    #   and train it distributedly.
    ########################################################
    sess = ad.create_distributed_session()
    sess.run(...)

Ready to try? Please refer to the examples in our Getting Started page.

References & Acknowledgements

We learned and borrowed insights from a few open source projects including Horovod, Parallax, and tf.distribute.