jepst / CloudHaskell

A distributed computing framework for Haskell
http://hackage.haskell.org/package/remote
BSD 3-Clause "New" or "Revised" License
347 stars 22 forks source link

Mutable state thats local for each node. #2

Closed molysgaard closed 13 years ago

molysgaard commented 13 years ago

Hi again. I'm writing a Chord DHT using your Cloud Haskell. When doing this each node in the DHT needs a local state that all processes can access. (even those spawned from remote processes.) Right now, this is implemented by a hack of mine exploiting the fact that the pid of the first process you run in Cloud Haskell is static. That way I store state in that process and all spawned processes can get/put the state using custom messages. You can see an example implementation in my fork in examples/Chord/Test.hs You need to export the ProcessId data constructor to make it work.

This hack works quite well and has never failed on my tests, there's just two bugs as I see it now.

Do you have any idea about how one would add state for each node. I've taken a look at the code in Process.hs and functions like getSelfPid already access a state-like structure. Would it be possible to add inn a MVar there that's inherited from the node state each time a new process is spawned? I tried to do this but that would have broken the signatures of several top level functions in Cloud Haskell. Both spawnLocalAnd, startLocalRegistry and initNode etc. This because one would have to have a initial node state all these places. I also got a problem with polymorphism, because the state type had to have a defined type at compile time. I ended up concluding I was "doing it wrong" and needed some help from someone that knows the code. What's your thoughts on this problem?

This is not an urgent issue for me in any way, just putting forth the idea. Afterall I have working state ;)

jepst commented 13 years ago

So, these are some good questions. Let's see.

The hack based on fixed process IDs is, needless to say, a bad idea; the numbering of PIDs is entirely coincidental and may be change between versions. I don't think changing the structure of Node to allow user-level data storage makes sense either. For now, I think a better solution is to have one node start the state-bearing processes on all nodes, so that it knows the PIDs of all such processes. This is the solution used by the Remote.Task module. In the future (hopefully, next few weeks), I hope to implement a node-level process name lookup service, which will let you assign publicly-visible names to particular processes. Then it would be just a matter of calling (lookupProcess someNode "DHTstate") or something, which will return a PID; you can send messages to that processes to access its state. This would ultimately be the best solution for your problem.

You can use MVars between processes, as long as those processes run on the same node. The way to do this is to capture an MVar when calling spawnLocal:

do mv <- liftIO newEmptyMVar spawnLocal $ liftIO (putMVar mv "hi") liftIO (takeMVar mv)

Nevertheless, we don't let you serialize MVars, because that might mean shared MVars between processes on different nodes. And of course variable capture doesn't work with the non-local version of spawn. So you would have to start all such processes locally.

I don't understand what you mean by state corruption. Messages are handled synchronously, so assuming that you have a receiveWait loop, with each incoming message causing an update via tail recursion, there's no concurrent access to variables. E.g. the counter example in the paper. If I'm not understanding you correctly, please let me know, maybe send me some code.

I'm also not sure what you mean about the polymorphism problem; again, send me some code, and I'll take a look at it. In general, polymorphic remotable functions are not permitted.

jepst commented 13 years ago

The ability to assign names to processes has now been implemented. Use the nameSet and nameQuery functions in the Remote.Process module. This means it's now easy to create node-local state:

nodeFavoriteColor :: ProcessM ()
nodeFavoriteColor =
 do nameSet "favorite_color"
    loop Blue
 where loop color =
     receiveWait
        [ match (\newcolor -> return newcolor),
          match (\pid -> send pid color >> return color)
        ] >>= loop

setFavoriteColor :: NodeId -> Color -> ProcessM ()
setFavoriteColor nid color =
 do (Just pid) <- nameQuery nid "favorite_color"
    send pid color

getFavoriteColor :: NodeId -> ProcessM Color
getFavoriteColor nid =
 do (Just pid) <- nameQuery nid "favorite_color"
    mypid <- getSelfPid
    send pid mypid
    expect

(You can reduce network use if you cache process IDs locally, rather than doing a nameQuery each time.)

If you'd like to follow up with your other questions, please open a new issue or email me.