Open jm-nab opened 2 months ago
can you tell me what happens if you call ray.init() before
processing_task = process_space.remote(space_key, username, api_key)
?
can you tell me what happens if you call ray.init() before
processing_task = process_space.remote(space_key, username, api_key)
?
I had a helper function called connect_ray
which would check if not ray.is_initialized():
it would then run ray.init(...config)
just before process_space.remote
After looking at this the last few days, I did the following.
1) ensured that all dependencies were installed on both the workers AND head
2) Skipped the auto_init_ray
logic by specifying the following ENV var: https://github.com/ray-project/ray/blob/88a6c3961db6c5c9e84b9751f8d4ae2e47c7eece/python/ray/_private/auto_init_hook.py#L7
I'm not sure if I'd be able to help contribute.
The main confusion for me was with the AssertionError
it asserted the existence of the attributes in ray.experimental.internal_kv._internal_kv_initialized
, and it took me quite a bit of digging to determine where and how that attribute was being attached, initialized, and setup.
Is there any kind of helpful message that could be added? I'd be more than happy to open a PR for the various cases on why the assertion is being made, and what the troubleshooting hint would be.
Something like this?:
assert ray.experimental.internal_kv._internal_kv_initialized(), "AssertionError: ray.experimental.internal_kv._internal_kv_initialized(): GCSClient hasn't been initialized: did the head node fail to start, are there multiple instances running, are the dependencies deployed to the head and worker nodes, ..."
What happened + What you expected to happen
Expected ray to begin processing.
However I got an error.
Would it be the case that I just got lucky with local development on docker compose since 6379 was already defaulted to a running redis instance?
Other things I have tried:
annotation
like so:ray.io/ft-enabled: "false"
RAY_ENABLE_AUTO_CONNECT
to0
in the client environVersions / Dependencies
CPython 3.11.9 Ray 2.32.0
Reproduction script
I ssh'd into a worker node and did the following, and wasn't able to reproduce the same issue that the webworker is running into:
Head node:
ps aux
from head node:Issue Severity
High: It blocks me from completing my task.
Related Resources: