Closed jbedorf closed 2 years ago
@ericl Hey Eric, I have a fix to correct this specific behavior, but want to check with you what is expected behavior of gcs client when a key does not exist? Should it return None (not empty bytes)?
@mwtian See above. Can you help clarify the behavior of gcs client or point me to someone?
If this is about gcs kv client (for get / put etc), @iycheng will be the most knowledgeable. Thanks for making the fix, and feel free to assign both of us to the PR!
For ray.experimental.internal_kv._internal_kv_get()
on a non-existent key, returning None
seems right.
I'm unable to produce this bug. @xwjiang2010 did you produce a fix for this, and can this issue be closed?
@mwtian Thanks for the response. In that case, I will close my PR and reassign it to you :)
Minimal reproduce:
In [1]: import ray
In [2]: ray.init(f"ray://127.0.0.1:10001") # Comment out to make this work.
Out[2]: ClientContext(dashboard_url=None, python_version='3.7.11', ray_version='2.0.0.dev0', ray_commit='{{RAY_COMMIT_SHA}}', protocol_version='2021-12-07', _num_clients=1, _context_to_restore=<ray.util.client._ClientContext object at 0x7f8be02ed610>)
In [3]: from ray.experimental.internal_kv import _internal_kv_initialized, \
...: ...: _internal_kv_get, _internal_kv_put
In [4]: _internal_kv_initialized()
Out[4]: True
In [5]: value = _internal_kv_get("bla")
In [6]: value
Out[6]: b''
In [7]:
@xwjiang2010 , just to make sure, Out[6]: b''
is unexpected, and it should be None
instead?
@iycheng, do you want to take a look?
@mwtian that's my assumption about gcs client protocol. Maybe @iycheng can clarify?
@mwtian @iycheng Do you have any update for this? It seems we have met the same issue in our application.
This is a P0 issue from our side. @ericl CC
@jovany-wang just to confirm, you are receiving empty bytes when calling _internal_kv_get()
on a non-existent key via Ray client, but None
is returned when not using Ray client, right?
@mwtian I believe it's totally the same issue according to my stack:
---------------------------------------------------------------------------
EOFError Traceback (most recent call last)
/tmp/ipykernel_4689/1080049057.py in <module>
47
48 ray.client('100.88.148.29:38159').connect()
---> 49 main()
/tmp/ipykernel_4689/1080049057.py in main()
33
34 # Create our RLlib Trainer.
---> 35 trainer = PPOTrainer(config=config)
36
37 # Run it for n training iterations. A training iteration includes
~/.local/lib/python3.7/site-packages/ray/rllib/agents/trainer_template.py in __init__(self, config, env, logger_creator)
121
122 def __init__(self, config=None, env=None, logger_creator=None):
--> 123 Trainer.__init__(self, config, env, logger_creator)
124
125 def _init(self, config: TrainerConfigDict,
~/.local/lib/python3.7/site-packages/ray/rllib/agents/trainer.py in __init__(self, config, env, logger_creator)
546 logger_creator = default_logger_creator
547
--> 548 super().__init__(config, logger_creator)
549
550 @classmethod
~/.local/lib/python3.7/site-packages/ray/tune/trainable.py in __init__(self, config, logger_creator)
96
97 start_time = time.time()
---> 98 self.setup(copy.deepcopy(self.config))
99 setup_time = time.time() - start_time
100 if setup_time > SETUP_TIME_THRESHOLD:
~/.local/lib/python3.7/site-packages/ray/rllib/agents/trainer.py in setup(self, config)
640 # An already registered env.
641 if _global_registry.contains(ENV_CREATOR, env):
--> 642 self.env_creator = _global_registry.get(ENV_CREATOR, env)
643 # A class specifier.
644 elif "." in env:
~/.local/lib/python3.7/site-packages/ray/tune/registry.py in get(self, category, key)
138 "Registry value for {}/{} doesn't exist.".format(
139 category, key))
--> 140 return pickle.loads(value)
141 else:
142 return pickle.loads(self._to_flush[(category, key)])
EOFError: Ran out of input
@mwtian FYI, we are using 1.4 or 1.2 I believe _internal_kv_get
is not used.
@mwtian FYI, we are using 1.4 or 1.2 I believe
_internal_kv_get
is not used.
Sorry, it still uses _internal_kv_get
:
def get(self, category, key):
if _internal_kv_initialized():
value = _internal_kv_get(_make_key(category, key))
if value is None:
raise ValueError(
"Registry value for {}/{} doesn't exist.".format(
category, key))
return pickle.loads(value)
Will try to take a look tomorrow. Btw the fix will very unlikely get back ported.
@mwtian Do we have any update?
Let's see if https://github.com/ray-project/ray/pull/24058 can fix the issue.
Search before asking
Ray Component
RLlib
What happened + What you expected to happen
When using RLlib and Ray Client then you will receive an error (see below) when relying on:
ray.init(f"ray://127.0.0.1:10001")
whereas things work when using:export RAY_ADDRESS="ray://127.0.0.1:10001"
In particular this error only happens when using the default gym registered strings. When using a custom registration then code runs as expected.
So:
Versions / Dependencies
Ray 1.10.0-py38 Docker image with TensorFlow installed.
Reproduction script
Anything else
Happens always.
Are you willing to submit a PR?