nybbles commented 7 years ago

Why?

Channels are a simplification of the workqueues already implemented in torch.ipc. A torch.ipc workqueue has two queues - one that is used to send items to workers and the other that is used to send back results to the single owner thread. A channel just has a single queue that you can enqueue or dequeue from.

Channels can be used to specify concurrent workflows in a far more flexible way than workqueues. For example, a unit test is included that contains an example of a workqueue implemented with N workload generators and M workers that does not use explicit control signals. Instead, the fact that a channel has been closed or has been drained is used to stop workers. There is also a unit test which shows how local model parallelism for forward inference can be implemented using channels.

What?

Channels are a thread synchronization primitive based on message-passing. Threads communicate via channels by writing messages onto them and reading messages out of them, in FIFO order. There is no restriction on which threads or how many threads can read or write from a channel.

Channels can also be closed, which prevents further writes to it. Once all items are read from a closed channel, that channel becomes drained and nothing further can be read from it. DAGs of computation made up for channels can be shut down via cascading closing/draining of channels.

The semantics of ipc.channel is based on:

Behaviors not implemented

I tried to make this change as non-intrusive as possible while still providing basic functionality. Therefore, there is a bunch of duplicated code between ipc.workqueue and ipc.channel and there are some pieces of functionality that were not implemented, specifically:

It is not possible to specify the max number of items that can be written to a channel. The channel just grows to allow the write, instead of blocking on write. Therefore, just like workqueue, there is no backpressure mechanism.
There is no select call to select between a number of channels.

Next steps

The following are possible next steps:

Make ipc.workqueue use ipc.channel internally, instead of having duplicated code.
Make it possible to specify the max number of items that can be written to a channel.
Implement select call.

Also, another issue that can happen for both workqueues and channels is that you can cause a memory leak by losing a reference to a workqueue or channel that still has items on it. That should be fixed.