Closed eric-czech closed 3 years ago
FYI, this is certainly necessary as these errors occur in worker logs when trying to use sgkit:
distributed.worker - ERROR - No module named 'xarray' Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/distributed/worker.py", line 915, in handle_scheduler await self.handle_stream(
File "/opt/conda/lib/python3.8/site-packages/distributed/core.py", line 579, in handle_stream msgs = await comm.read()
File "/opt/conda/lib/python3.8/site-packages/distributed/comm/tcp.py", line 204, in read msg = await from_frames(
File "/opt/conda/lib/python3.8/site-packages/distributed/comm/utils.py", line 87, in from_frames res = _from_frames()
File "/opt/conda/lib/python3.8/site-packages/distributed/comm/utils.py", line 65, in _from_frames return protocol.loads(
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/core.py", line 151, in loads value = _deserialize(head, fs, deserializers=deserializers)
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 335, in deserialize return loads(header, frames)
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/serialize.py", line 71, in pickle_loads return pickle.loads(x, buffers=buffers)
File "/opt/conda/lib/python3.8/site-packages/distributed/protocol/pickle.py", line 75, in loads return pickle.loads(x) ModuleNotFoundError: No module named 'xarray'
distributed.worker - INFO - Connection to scheduler broken. Reconnecting...
It's unclear to me what all is possible in sgkit on vanilla dask cluster at the moment. I suspect not much so we'll probably need library-specific cluster configuration instructions.
This can also be accomplished by passing variables like this env_vars.json to the env_vars
parameter of Dask Cloud Provider VM Cluster instances. See here for a larger discussion on that.
Using env vars for this adds a few minutes to the cluster start times. This hasn't been annoying enough yet to try to build custom VM images so I'll stick with it for now.
@eric-czech having followed the linked cloud provider discussion, I still don't quite understand why you're holding out on building custom VM images. If you've got a workflow that works for you though, that's fine.
It's not worth the effort to make the process 20% faster right now. It would be eventually, but I don't care enough yet while I'm still struggling to do fairly basic things w/ distributed dask + sgkit.
With GCP support in Dask Cloud Provider (see https://github.com/related-sciences/ukb-gwas-pipeline-nealelab/issues/19), next we need to determine how to modify the worker images to contain necessary dependencies.
This is probably the right approach: https://cloudprovider.dask.org/en/latest/packer.html. We probably want the dask docker image to be installed on the VM image along with any extra software before switching to custom VM and docker images in the Cloud Provider cluster configs.