ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.23k stars 5.81k forks source link

Ray RLlib Algorithm.from_checkpoint bug (MultiBinary passed dtype by ray) #36416

Open pepi99 opened 1 year ago

pepi99 commented 1 year ago

Error:

Model: "model"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to
==================================================================================================
 inputs (InputLayer)            [(None, None, 337)]  0           []

 dense (Dense)                  (None, None, 32)     10816       ['inputs[0][0]']

 memory_in_0 (InputLayer)       [(None, None, 32)]   0           []

 mha_1 (SkipConnection)         (None, None, 32)     37344       ['dense[0][0]',
                                                                  'memory_in_0[0][0]']

 pos_wise_mlp_1 (SkipConnection  (None, None, 32)    6792        ['mha_1[0][0]']
 )

 memory_in_1 (InputLayer)       [(None, None, 32)]   0           []

 mha_2 (SkipConnection)         (None, None, 32)     37344       ['pos_wise_mlp_1[0][0]',
                                                                  'memory_in_1[0][0]']

 pos_wise_mlp_2 (SkipConnection  (None, None, 32)    6792        ['mha_2[0][0]']
 )

 memory_in_2 (InputLayer)       [(None, None, 32)]   0           []

 mha_3 (SkipConnection)         (None, None, 32)     37344       ['pos_wise_mlp_2[0][0]',
                                                                  'memory_in_2[0][0]']

 pos_wise_mlp_3 (SkipConnection  (None, None, 32)    6792        ['mha_3[0][0]']
 )

 memory_in_3 (InputLayer)       [(None, None, 32)]   0           []

 mha_4 (SkipConnection)         (None, None, 32)     37344       ['pos_wise_mlp_3[0][0]',
                                                                  'memory_in_3[0][0]']

 pos_wise_mlp_4 (SkipConnection  (None, None, 32)    6792        ['mha_4[0][0]']
 )

 logits (Dense)                 (None, None, 81)     2673        ['pos_wise_mlp_4[0][0]']

 values (Dense)                 (None, None, 1)      33          ['pos_wise_mlp_4[0][0]']

==================================================================================================
Total params: 190,066
Trainable params: 190,066
Non-trainable params: 0
__________________________________________________________________________________________________
2023-06-14 11:29:31,950 INFO trainable.py:172 -- Trainable.setup took 37.658 seconds. If your trainable is slow to initialize, consider setting reuse_actors=True to reduce actor creation overheads.
2023-06-14 11:29:31,950 WARNING util.py:67 -- Install gputil for GPU system monitoring.
Traceback (most recent call last):
  File "/usr/lib64/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib64/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/pulev/order-placement-optimization-agent/backtests/backtest.py", line 26, in <module>
    ppo_model = ServePPOModel(side=side,
  File "/home/pulev/order-placement-optimization-agent/ray_algorithms/ray_ppo.py", line 44, in __init__
    self._load()
  File "/home/pulev/order-placement-optimization-agent/ray_algorithms/ray_algorithm.py", line 36, in _load
    self.model = Algorithm.from_checkpoint(self.model_path)
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 302, in from_checkpoint
    return Algorithm.from_state(state)
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 332, in from_state
    new_algo.__setstate__(state)
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/algorithms/algorithm.py", line 2573, in __setstate__
    self.workers.local_worker().set_state(state["worker"])
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/evaluation/rollout_worker.py", line 1709, in set_state
    self.policy_map[pid].set_state(policy_state)
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/policy/tf_mixins.py", line 200, in set_state
    super().set_state(state)
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/policy/eager_tf_policy_v2.py", line 763, in set_state
    super().set_state(state)
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/policy/policy.py", line 1050, in set_state
    policy_spec = PolicySpec.deserialize(state["policy_spec"])
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/policy/policy.py", line 167, in deserialize
    observation_space=space_from_dict(spec["observation_space"]),
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/utils/serialization.py", line 295, in space_from_dict
    space.original_space = space_from_dict(d["original_space"])
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/utils/serialization.py", line 286, in space_from_dict
    space = gym_space_from_dict(d["space"])
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/utils/serialization.py", line 281, in gym_space_from_dict
    return space_map[space_type](d)
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/utils/serialization.py", line 247, in _dict
    spaces = {k: gym_space_from_dict(sp) for k, sp in d["spaces"].items()}
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/utils/serialization.py", line 247, in <dictcomp>
    spaces = {k: gym_space_from_dict(sp) for k, sp in d["spaces"].items()}
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/utils/serialization.py", line 281, in gym_space_from_dict
    return space_map[space_type](d)
  File "/home/pulev/order-placement-optimization-agent/venv/lib64/python3.9/site-packages/ray/rllib/utils/serialization.py", line 231, in _multi_binary
    return gym.spaces.MultiBinary(**__common(d))
TypeError: __init__() got an unexpected keyword argument 'dtype'

Versions / Dependencies

ray==2.4.0 gymnasium==0.28.1

Reproduction script

model = Algorithm.from_checkpoint('path_to_model_dir')

Issue Severity

High: It blocks me from completing my task.

pepi99 commented 1 year ago

It's possible that the problem comes from this function in serialization.py (in line 209) in ray rllib:

    def __common(d: Dict):
        """Common updates to the dict before we use it to construct spaces"""
        ret = d.copy()
        del ret["space"]
        if "dtype" in ret:
            ret["dtype"] = np.dtype(ret["dtype"])
        return ret
pepi99 commented 1 year ago

The essential problem is that the multi binary space has a dtype, and in the MultiBinary class no dtype argument is expected. I don't know how this dtype is set, and from where.

pepi99 commented 1 year ago

I did a temporary solution (in serializaiton.py), should I commit a fix and link it to this issue?

Rohan138 commented 1 year ago

Could you check if your issue is related to https://github.com/ray-project/ray/pull/34762? If it is the same thing, then installing the new 2.5.0 release should fix the issue. If not, then please submit a PR with your fix. Thanks!