Open jdrusso opened 2 years ago
Maybe this would work well with a broader rewrite of the parallelism as a work manager interface
@shz66 This is what we were talking about the other day, just tagging you here
@jdrusso sounds good. I am gonna work on this and the next week or so. Feel free to assign the issue to me too
Assigned you, thanks for working on this!
@SHZ66 Maybe something to keep in mind as you're doing this: #37
Clustering in particular is a stateful operation, which currently relies on serializing/deserializing modelWE objects from the Ray object store.
Ideally, there isn't much overhead associated with this, but I think it becomes noticeable on systems without shared memory between workers.
Instead of doing the parallelism via Ray processes, we can initialize a set of Actors to do work. Actors are stateful, so we can just initialize them with the current model state (with unnecessary stuff stripped out).