AutoDist is a distributed deep learning training engine for TensorFlow. AutoDist provides a user-friendly interface to distribute the training of a wide variety deep learning models across many GPUs with scalability and minimal code change.
Different from specialized distributed ML systems, AutoDist is created to speed up a broad range of DL models with excellent all-round performance. AutoDist achieves this goal by:
Besides all these advanced features, AutoDist is designed to isolate the sophistication of distributed systems from ML prototyping and exposes a simple API that makes it easy to use and switch between different distributed ML techniques for users of all levels.
For a closer look at the performance, please refer to our doc.
Installation:
pip install autodist
Modifying existing TensorFlow code to use AutoDist is easy:
import tensorflow as tf
from autodist import AutoDist
ad = AutoDist(resource_spec_file="resource_spec.yml")
with tf.Graph().as_default(), ad.scope():
########################################################
# Build your (single-device) model here,
# and train it distributedly.
########################################################
sess = ad.create_distributed_session()
sess.run(...)
Ready to try? Please refer to the examples in our Getting Started page.
We learned and borrowed insights from a few open source projects including Horovod, Parallax, and tf.distribute.