I am trying to run the Streaming MapReduce example on a remote cluster, but the wikipedia module can not be found, even though it's installed both locally and on all cluster nodes.
Reproduction (REQUIRED)
I built a custom docker image using the following Dockerfile:
FROM rayproject/ray:nightly-py38
RUN conda install -c conda-forge wikipedia
The cluster is deployed on Kubernetes using Ray Operator, and confirmed working by running a simple task with no dependencies.
All cluster nodes are using the above custom image.
I confirmed that the wikipedia module is installed by running a container from this image, and importing the module in the Python shell
I am using a virtual environment locally, created/managed by poetry. the wikipedia module is confirmed to be installed in the virtualenv.
When trying to run the above example, I get the following stack trace:
Traceback (most recent call last):
File "streaming.py", line 92, in <module>
mappers = [Mapper.remote(stream) for stream in streams]
File "streaming.py", line 92, in <listcomp>
mappers = [Mapper.remote(stream) for stream in streams]
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/actor.py", line 418, in remote
return self._remote(args=args, kwargs=kwargs)
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/tracing/tracing_helper.py", line 316, in _invocation_actor_class_remote_span
return method(self, args, kwargs, *_args, **_kwargs)
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/actor.py", line 572, in _remote
return client_mode_convert_actor(
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/_private/client_mode_hook.py", line 134, in client_mode_convert_actor
return client_actor._remote(in_args, in_kwargs, **kwargs)
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/common.py", line 232, in _remote
return self.options(**option_args).remote(*args, **kwargs)
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/common.py", line 345, in remote
ref_ids = ray.call_remote(self, *args, **kwargs)
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/api.py", line 103, in call_remote
return self.worker.call_remote(instance, *args, **kwargs)
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/worker.py", line 302, in call_remote
task = instance._prepare_client_task()
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/common.py", line 338, in _prepare_client_task
task = self.remote_stub._prepare_client_task()
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/common.py", line 243, in _prepare_client_task
self._ensure_ref()
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/common.py", line 215, in _ensure_ref
self._ref = ray.put(
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/api.py", line 52, in put
return self.worker.put(*args, **kwargs)
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/worker.py", line 240, in put
out = [self._put(x, client_ref_id=client_ref_id) for x in to_put]
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/worker.py", line 240, in <listcomp>
out = [self._put(x, client_ref_id=client_ref_id) for x in to_put]
File "/home/eugene/.cache/pypoetry/virtualenvs/ray-play-Watq55vL-py3.8/lib/python3.8/site-packages/ray/util/client/worker.py", line 260, in _put
raise cloudpickle.loads(resp.error)
ModuleNotFoundError: No module named 'wikipedia'
The example does run locally as expected, if I leave ray.init() alone and don't connect to a remote cluster. Which leads me to believe this isn't an environment issue, but I'm not certain about that.
Is there anything else I need to do to make this module available on the remote nodes? It's also unclear to me whether both the head and worker nodes need to be running the custom image, or just the workers.
What is the problem?
I am trying to run the Streaming MapReduce example on a remote cluster, but the
wikipedia
module can not be found, even though it's installed both locally and on all cluster nodes.Reproduction (REQUIRED)
I built a custom docker image using the following Dockerfile:
wikipedia
module is installed by running a container from this image, and importing the module in the Python shellpoetry
. thewikipedia
module is confirmed to be installed in the virtualenv.ray.init()
withray.util.connect(....)
When trying to run the above example, I get the following stack trace:
The example does run locally as expected, if I leave
ray.init()
alone and don't connect to a remote cluster. Which leads me to believe this isn't an environment issue, but I'm not certain about that.Is there anything else I need to do to make this module available on the remote nodes? It's also unclear to me whether both the
head
andworker
nodes need to be running the custom image, or just theworkers
.