snuspl / dolphin

14 stars 2 forks source link

A parameter server may not apply updates for received values to its store #152

Closed beomyeol closed 8 years ago

beomyeol commented 8 years ago

dolphin-ps uses REEF's Network Connection Service and deals with messages that are received from parameter workers by Wake event handler. In current implementation, dolphin-ps may not apply updates for received values to its key-value store when it is used with ParameterUpdater. For example, there are two parameter workers (A and B) and a parameter server with a parameter updater that adds received values to the value in its key-value store. Suppose a value v associated with a key k is stored in the key-value store. The worker A and B push a and b values associated wth k to the parameter server, respectively. The message handler thread for message a retrieves v and tries to set v+a for k. However, the message handler thread for message b also retrieves v and sets v+b for k simultaneously. In this case, one of updates for the received messages is lost. We should handle this case, so that the parameter server can apply updates for both received values.

bchocho commented 8 years ago

We must ensure that, per key, the parameter server applies each update atomically. We also want to allow thread-level parallelism, relatively load-balanced across the threads. How about the following design?

Partition the key-space into p partitions. Each partition has a blocking queue (with updates and reads) and thread (there are p queue and thread pairs). The partition is decided by hash(key) % p. On receiving a message, the message handler computes the partition, and enqueues the message onto that queue. Each thread runs a loop on its queue.

Using a single thread per partition ensures that the key operations are done atomically (in fact, each key will have linearizability). The multi-node parameter server can build on this partitioning scheme.

bgchun commented 8 years ago

@dafrista I like the idea. We should do.

jsjason commented 8 years ago

Closed via #177.