Closed nybbles closed 7 years ago
Thanks for the nice description/context!
Seems like the build is failling, can you check: https://travis-ci.org/twitter/torch-ipc/jobs/190464716 ?
Okay now it's green :)
@nybbles Can you add some documentation in doc/channel.md? The above is a great preamble. But the doc should include one or two examples.
More generally, ipc.channel
is awesome.
@nicholas-leonard added some docs
Here's a link to them: https://github.com/nybbles/torch-ipc/blob/channels/doc/channel.md
Why?
Channels are a simplification of the workqueues already implemented in torch.ipc. A torch.ipc workqueue has two queues - one that is used to send items to workers and the other that is used to send back results to the single owner thread. A channel just has a single queue that you can enqueue or dequeue from.
Channels can be used to specify concurrent workflows in a far more flexible way than workqueues. For example, a unit test is included that contains an example of a workqueue implemented with N workload generators and M workers that does not use explicit control signals. Instead, the fact that a channel has been closed or has been drained is used to stop workers. There is also a unit test which shows how local model parallelism for forward inference can be implemented using channels.
What?
Channels are a thread synchronization primitive based on message-passing. Threads communicate via channels by writing messages onto them and reading messages out of them, in FIFO order. There is no restriction on which threads or how many threads can read or write from a channel.
Channels can also be closed, which prevents further writes to it. Once all items are read from a closed channel, that channel becomes drained and nothing further can be read from it. DAGs of computation made up for channels can be shut down via cascading closing/draining of channels.
The semantics of
ipc.channel
is based on:Behaviors not implemented
I tried to make this change as non-intrusive as possible while still providing basic functionality. Therefore, there is a bunch of duplicated code between
ipc.workqueue
andipc.channel
and there are some pieces of functionality that were not implemented, specifically:select
call to select between a number of channels.Next steps
The following are possible next steps:
ipc.workqueue
useipc.channel
internally, instead of having duplicated code.select
call.Also, another issue that can happen for both workqueues and channels is that you can cause a memory leak by losing a reference to a workqueue or channel that still has items on it. That should be fixed.