rainwoodman / vast

vala and scientific numerical computation
11 stars 1 forks source link

Ideas for distributed computing #1

Open arteymix opened 7 years ago

arteymix commented 7 years ago

Can I suggest MessagePack-RPC here?

I'm currently working on a msgpack-c wrapper with the following features in mind:

Vala has automatic GValue conversion, so we could seamlessly send packed primitive types.

I also have a working SAGA implementation that support both local and TORQUE backends. I planned to release it under LGPL when it would get stable enough. One could use that to automate job scheduling. I use it with Valum to power miRBooking API.

For computing, I think that GSL could provide a good low-level engine. There's bindings already distributed with Vala. However, it's licensed under GPL.

rainwoodman commented 7 years ago

I think it is still early to actually working on distributed computing; we need a first version that works on a single process. I brief looked at the specs of MesssagePack-RPC. The random order seems to suggest it can run as a horde of processes doing P2P communication (random response order). But then the Server/Client language seems a bit limiting.

Meanwhile, next generation CPU will have hundreds of concurrent threads, so going beyond one node is not a critical need ; there are plenty of models that can be trained on one node now and in the near future.

Down the road, a job interface is going to be useful -- likely after we have some capacity to run on a cluster, and then our LiveCD we'll dress up the local machine as a cluster. Your miRBookingAPI looks pretty cool. My concern is depending too much beyond GNOME. Reason: the market is a saturated one -- to survive, we need to fit into a niche.

I am not very fond of GSL. (for no particular reason.) The models I work with I only need some linear algebra support in the execution engine (functionality covered by numpy.linalg and numpy.fft). Maybe binding openblas, and fftpack? That's what numpy uses. I can probably find someone at work to contribute a few minimizers; if not it's not hard to translate his C++ code to Vala. Eventually if we gain popularity, people will demand linking against MKL. Staying with openblas API may make it easier.

I haven't thought too much into the license. Anything legal and open should be fine.

arteymix commented 7 years ago

I typically work with LGPLv3 as it provide a liberal shared linking policy while preserving basic user freedom.

As I said over the wire, we should use Avahi for broadcasting and discovering nodes over the network and MessagePack-RPC to pass arrays.

With a very low latency (e.g. InfiniBand) , it's generally more expensive to compress than sending the data in raw. This is why MessagePack is highly convenient: it is fast to pack and unpack and supports extension types to cover GType.

arteymix commented 7 years ago

In the large scope:

Each node would have a GThreadPool for handling requests (see ThreadedSocketService).

var cluster = new Vast.Cluster (); // inherit from SocketService and handle both Avahi/MessagePack-RPC clients

cluster.broadcast (new InetSocketAddress (new InetAddress.from_string ("some.multicast.address")); // setup Avahi broadcasting

cluster.listen (new InetSocketAddress (new InetAddress.any (), ...)); // setup MessagePack-RPC server

var x = new Vast.Array (typeof (double), sizeof (double), {10});
var sin_x  = cluster.call_async ("sin", x); // forward to MessagePack.RPC.Client

new MainLoop ().run ();
rainwoodman commented 7 years ago

How do you distribute data of x?

arteymix commented 7 years ago

Partition with Array.slice and send each slice to different node in the cluster. This process should be seamless so the call/call_async would have to be aware if the function is element-wise or dimension-wise and partition accordingly.

I also think that this should apply as well to computation graph: we would send the computation to perform along with its input for evaluation across the network. This would constitute a batch mode.

The nice thing with this model is that you could setup a cluster and join it without broadcasting yourself as a node. We could have a simple CLI client to send requests and pipe big data files over the network.

rainwoodman commented 7 years ago

But, ...the array should have been distributed to begin with -- mapping out from a single node is not a scalable pattern.

In dask, they have a distributed constructor for distrubted array objects. (dask.array or dask.delayed).

arteymix commented 7 years ago

Dask is a big piece and doing something similar will require primitives for sending dense arrays over the wire and operate on them anyway. We have plenty of time for deciding how to approach this.

rainwoodman commented 7 years ago

No no, you got my idea wrong.

We want to avoid sending dense array around by starting with a distributed array constructor. For example,

   mystart = rank * 100;
   myend = mystart + 100;
   var a = read_from_distributed_datasource(datasource, mystart, myend); # this is like dask.delayed

Then a is a distributed array and there is no communication between nodes.

Your proposal looks like:

   mystart = rank * 100;
   myend = mystart + 100;
   if(rank == 0)
       all = read_from_distributed_datasource(datasource, 0, end);
   all.scatter(cluster);

Now that I write these down, we are probably talking about two different abstraction layers. We shall think about this a bit more.

arteymix commented 7 years ago

Yes, I'm a bit underneath and I try to plan down all the primitives we'll need to distribute the segments of the array across the network.

From my perspective, only the initiator of the request should know the whole picture whereas each node store and process its own independent piece of the array. These, however, need to be transmitted and operated on.

We can add some RPC primitives to store and retrieve dense arrays and apply functions by identifier so that we don't move around data, but then the whole thing become stateful and we need to do some bookkeeping all around.

Dask provides an elegant numpy-like interface to what appears to be building and distributing a computation graph over a distributed array, what should ultimately be possible with RPC.

I think we have all we need to build something great. The GIO module provide a very wide set of facilities to do some network programming :+1:

arteymix commented 7 years ago

MessagePack-RPC is only a wire protocol, so we can do it on top of ZeroMQ if necessary. The good thing is that it will efficiently pack and unpack our data structures and will allow us to inter-operate quite nicely.

Actually, I would prefer using a multicast IP address to do node discovery via Avahi and then establish TCP connections between nodes that would send remote procedures. At a higher level we could define a set of procedures for:

The store procedure store an array remotely and return a UUID for refeering to it later on.

The retreive procedure retreive a remotely stored array from it's UUID.

The clear procedure delete a remotely stored array. We might not actually need this with ttl, but it can be nice to free some space early on.

The function works either on a set of UUID refeering to existing arrays. It applies the specified function and returns all UUIDs to the resulting arrays stored remotely.

I'll publish https://github.com/arteymix/msgpack-glib very soon. I just need to get things working.