ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
33.29k stars 5.63k forks source link

[RLlib] ValueError in initialization of ImpalaTF2Policy #45050

Open rubenjacob opened 5 months ago

rubenjacob commented 5 months ago

What happened + What you expected to happen

Initializing ImpalaTF2Policy currently throws a ValueError since self.cur_lr is a tf.Variable but the optimizer class only takes floats, LearningRateSchedules or callables.

  File "site-packages\ray\rllib\algorithms\impala\impala_tf_policy.py", line 316, in __init__
    self.maybe_initialize_optimizer_and_loss()
  File "site-packages\ray\rllib\policy\eager_tf_policy_v2.py", line 462, in maybe_initialize_optimizer_and_loss
    optimizers = force_list(self.optimizer())
                            ^^^^^^^^^^^^^^^^
  File "site-packages\ray\rllib\algorithms\impala\impala_tf_policy.py", line 230, in optimizer
    optim = tf.keras.optimizers.Adam(self.cur_lr)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "site-packages\keras\src\optimizers\adam.py", line 62, in __init__
    super().__init__(
  File "site-packages\keras\src\backend\tensorflow\optimizer.py", line 22, in __init__
    super().__init__(*args, **kwargs)
  File "site-packages\keras\src\optimizers\base_optimizer.py", line 124, in __init__
    raise ValueError(
ValueError: Argument `learning_rate` should be float, or an instance of LearningRateSchedule, or a callable (that takes in the current iteration value and returns the corresponding learning rate value). Received instead: learning_rate=<tf.Variable 'lr:0' shape=() dtype=float32, numpy=0.0005>

Versions / Dependencies

Ray == 2.10.0 Python == 3.11.9 OS == Win10 Tensorflow == 2.16.1

Reproduction script

from ray.rllib.algorithms.impala.impala import ImpalaConfig
from ray.rllib.algorithms.impala.impala_tf_policy import ImpalaTF2Policy
import gymnasium as gym

obs_space = gym.spaces.Box(high=1, low=-1, shape=(10,))
action_space = gym.spaces.Box(high=1, low=-1, shape=(5,))
config = ImpalaConfig()
config.framework_str = "tf2"
policy = ImpalaTF2Policy(obs_space, action_space, config)

Issue Severity

High: It blocks me from completing my task.

LorenzoMattia commented 5 months ago

Hi @rubenjacob, I faced the same issue with PPO. So far I solved downgrading tensorflow version from 2.16 to 2.15. Probably something in the compatibility with Ray went wrong with the latest tensorflow update.

I run your reproduction script and it ends without errors. Moreover, printing policy.cur_lr I got "<tf.Variable 'lr:0' shape=() dtype=float32, numpy=0.0005>".

All my tests are done with Python 3.10, but I think it should work even with 3.11.9.

rubenjacob commented 5 months ago

Hi @LorenzoMattia thanks for your reply. I know that TF <= 2.15 works. I was trying to update my code to Tensorflow 2.16 and Keras 3 but I guess that isn't fully supported yet.

RocketRider commented 4 months ago

I created a pull request to fix the issue.