Rewrite clustering using Ray Actors

jdrusso commented 2 years ago

Clustering in particular is a stateful operation, which currently relies on serializing/deserializing modelWE objects from the Ray object store.

Ideally, there isn't much overhead associated with this, but I think it becomes noticeable on systems without shared memory between workers.

Instead of doing the parallelism via Ray processes, we can initialize a set of Actors to do work. Actors are stateful, so we can just initialize them with the current model state (with unnecessary stuff stripped out).

jdrusso commented 2 years ago

Maybe this would work well with a broader rewrite of the parallelism as a work manager interface

jdrusso commented 1 year ago

@shz66 This is what we were talking about the other day, just tagging you here

SHZ66 commented 1 year ago

@jdrusso sounds good. I am gonna work on this and the next week or so. Feel free to assign the issue to me too

jdrusso commented 1 year ago

Assigned you, thanks for working on this!

jdrusso commented 1 year ago

@SHZ66 Maybe something to keep in mind as you're doing this: #37

jdrusso / msm_we

Rewrite clustering using Ray Actors #32