husky-team / husky

A more expressive and most importantly, more efficient system for distributed data analytics.
http://www.husky-project.com/
Other
98 stars 55 forks source link

Butterfly Mixing Aggregator #72

Open Kelvin-Ng opened 7 years ago

Kelvin-Ng commented 7 years ago

The aggregator currently included in Husky is an AllReduce style aggregator. I think we can also include a Butterfly Mixing aggregator. It is a general framework that can be used to approximately implement distributed machine learning algorithms efficiently, but not limited to a particular type of algorithm.

I have two ideas on the API, the main difference is on update() function. One needs the user to put the iteration counter in the parameter of update(), then the system will calculate which worker to pair with. Another one is to let the aggregator maintain the counter. I personally like the previous one because it is cleaner and more transperent to users.

kygx-legend commented 7 years ago

To inherit the current Aggregator?

Kelvin-Ng commented 7 years ago

No. The current Aggregator should inherit the Butterfly Mixing aggregator, because the current aggregator is only a special case of the Butterfly Mixing aggregator.

kygx-legend commented 7 years ago

OK. Could you make the implementation?

Kelvin-Ng commented 7 years ago

I am busy...

By the way, I have told @zzxx-husky that the implementation can be greatly simplified by using ObjList and CombinedPushChannel directly. I think it is a good opportunity to implement the Butterfly Mixing aggregator in this way. Then, we can throw away the current aggregator and make a new AllReduce aggregator by using the Butterfly Mixing aggregator.

zzxx-husky commented 7 years ago

@Kelvin-Ng Actually based on my understanding, we can reimplemente aggregator using ObjList and specific push (the one we can specify dest worker). But the implemetation may not be simplified easily, you still need to design a base class to store aggregators in a unified way, think about how to do the global aggregation in a decentralized way, etc. And because I cannot simply implement aggregator using existing logics in husky, I think it's better for me to make aggregator independent so that it's easier to write unittests. And also it's easier to adapt if the inner logic of husky changes in the future.

P.S. Maybe I've done too much work on wrapping aggregator to make its APIs look nicer.

Kelvin-Ng commented 7 years ago

@zzxx-husky Maybe I can only get the feeling about the complexity after really working on it...

zzxx-husky commented 7 years ago

Just check the APIs of mailbox, I think it's possible to implement butterfly mixing on Github husky. But it's still synchronous. Is it the same as what you think?

Kelvin-Ng commented 7 years ago

Yes. Butterfly Mixing is synchronous.

Kelvin-Ng commented 7 years ago

No, I want to understand what you mean precisely by 'synchronous'. In Butterfly Mixing, the paired workers will synchronize synchronously, but the workers will not wait for the workers that are not paired.

zzxx-husky commented 7 years ago

Maybe we share the same understanding. BM needs log(W) iterations (W is the no. of workers) and for each iterations, each worker just needs to wait for one another worker. To perform a global aggregation, one worker needs to wait for log(M) workers totally. Lunch time ~

P.S. maybe better to think in terms of machine instead of worker?

Kelvin-Ng commented 7 years ago

Yes.

Actually I am also thinking about the problem about threads and machines. The paper of BM does not talk in terms of threads, but nodes. I think in Husky, we can do a full aggregation within a machine and then aggregate with one another machine, although I am not sure if this is precisely what being mentioned in the paper. But anyway, I am nearly sure that both can converge.

Kelvin-Ng commented 7 years ago

It seems that Butterfly Mixing cannot be done with ObjList + PushCombinedChannel.

Also, it would be good if the Butterfly Mixing aggregator can let the users specify how many machines to aggregate with in each iteration, instead of a fixed number, 2.

ddmbr commented 7 years ago

@Kelvin-Ng What you want will be more feasible after issue #76 is resolved. You may check my PR.

zzxx-husky commented 7 years ago

My understanding is the whole Butterfly Mixing (which contains log(W) iterations) can actually be done in one round of network communication in husky, based on the existing APIs of mailbox

Kelvin-Ng commented 7 years ago

You mean making a global aggregation after log(W) iteration, and make no aggregations in between? This is not Butterfly Mixing.

zzxx-husky commented 7 years ago

Nope. I mean I can do async message passing using the APIs of mailbox (once I flush the messages, they're really sent out), so each worker may need to wait for log(W) messages from others, but it does NOT mean I need to have log(W) rounds of network comm (each round of network comm is a global barrier). Instead, this can be done in only one round of network comm.

Kelvin-Ng commented 7 years ago

Don't understand what you mean...

Yuzhen11 commented 7 years ago

If the message between one thread and another thread is just a binstream each time, then we don't need extra support by mailbox to accomplish this, just wait for log(w) binstream received. Is that what you mean? @zzxx-husky Then is this case, we can also use this counting method to do BSP without send_complete? The functionality of barrier is pushed up to the channel layer from the mailbox layer?

zzxx-husky commented 7 years ago

Lucien implemented a simple version of BM. As I tested using a9 on w30, the CPU is fully used. Wrong rate is about 14%.