chewxy commented 7 years ago

There are many ways to do distributed computing for something like Gorgonia. There are a few things that need to be cleared up when discussing distributed neural networks.

Firstly, which part is distributed? The currently dominant methods basically works by splitting up the calculation of the gradients and the gradient updates on different parts of the network.

Other more traditional systems have different batches being parallelly trained across the network - but this usually relies on special algorithms that are capable of handling delays and latencies.

Or the entire neural network, if large enough, could be split up across the network. This is Google level engineering that I have no ability to emulate.

The more future-looking method involves synthetic/approximated gradients, functioning more like a database with locks and updates. I am personally in favour of this future-looking design. However, it is a deceptively simple problem and I have run into various hairy issues with this.

Of course, one can also combine the multiple notions of distributedness, but I think that may be a bit too ambitious.

Existing Implementations

These gradient descent methods lend themselves to being easily parallalized:

~~God's~~ Jeff Dean's DownpourSGD
Google's Delay Tolerant Adagrad
HogWild is also worthy of checking out.
Things To Be Aware/Think About
Latency kills progress
CAP theorem - well, marginally. Distributed NNs are far from requiring consistency. In fact I'd argue that distributed NNs require linearizability the most
Network consensus - given the abundance of RAFT implementations in Go, I'd say this is one of the few problems to be least worried about.
CapnProto looks good, but everyone else is using Protobuf to do their talking. Why?

helinwang commented 7 years ago

"CapnProto looks good, but everyone else is using Protobuf to do their talking. Why?" - Why not plain simple golang's gob? (If only used internally and no cross language boundary) Edit: Hmm, after reading more on CapnProto, it's very interesting.

russellwmy commented 6 years ago

So I don't if you know about celery (http://www.celeryproject.org) in Python. Here is golang implementation as well. You may want to take a look. https://github.com/gocelery/gocelery

In before I have an idea just make use of this library to build a distributed machine learning framework. But I am too lazy to do that.

chewxy commented 6 years ago

I'm not too worried about the task queue. That's the easiest part of the equation - users should be free to choose between any communications protocol they desire - goroutine, gRPC, goCelery etc.

The difficulty in distributed computing for Gorgonia comes from the fact that the algorithms have to be distributed as well.

I've been hacking on my own data parallel SGD for most of my projects - it's simple in concept: create a parent "model". Split the training data set up according to how many cores you have. For each core, clone the parent graph, run the graph on the model, and then accumulate the gradients. Finally, average the accumulated gradients from across the cores, and then use SGD to update the parent model.

Since the parent weights are tied to the cloned weights, the next iteration's weights are automatically updated.

The "parent" model is in some sense the "parameter server" if you are familiar with Tensorflow terms - it's not necessarily async. There's an async approach but it's way too fragile in any designs I've attempted. I need some help from anyone who's good at async stuff

gorgonia / gorgonia

Distributed Computing #12

Existing Implementations

Things To Be Aware/Think About