Open rth opened 4 years ago
+1 for trying with dask
+1 also for dask. We had an early celery-based attempt that we ended up scraping. it was hard to manage when something went wrong.
I think this is a great idea. I think that we wanted to try something with OpenStack
as well at some point.
I think this is only by implementing a dask (or other) worker that we can see how much abstraction we can have in RemoteWorker
.
BaseWorker
already provides the protocol but I don't know how much more the RemoteWorker
can do.
@rth do you have some insights?
So there is an initial implementation with a dask.distributed worker in https://github.com/paris-saclay-cds/ramp-board/pull/452. It passes all worker and dispatcher tests with a local dask cluster and most tests with a remote dask cluster (assuming the file paths for ramp kit, data etc are the same on the remote server). A few issues still need to be ironed out, but it gives a general idea. The code structure is very similar to a local conda worker.
Currently there is a local
CondaEnvWorker
and a remoteAWSWorker
. I would be usful to have a genericRemoteWorker
that's not tied to EC2 API. The idea would be to make submissions to a remote server that's already running.This could likely be done either,