[RLlib] Can not run custom TF Policy Example

davidADSP commented 1 year ago

What happened + What you expected to happen

Running the script in https://github.com/ray-project/ray/blob/master/rllib/examples/custom_tf_policy.py results in an error (use_critic=True but the critic is not found and also from_batch is depreciated)

The fix is to add use_critic = False like this: return compute_advantages(sample_batch, 0.0, policy.config["gamma"], use_gae=False, use_critic=False) and also remove the from_batch to give this instead logits, _ = model(train_batch)

Versions / Dependencies

Ray 2.3.1

Reproduction script

Running this script:

https://github.com/ray-project/ray/blob/master/rllib/examples/custom_tf_policy.py

Issue Severity

Low: It annoys or frustrates me.

Rohan138 commented 1 year ago

I can succesfully run the script on the current master, could you retry it or detail your ray and tf version?

davidADSP commented 1 year ago

Ray 2.3.1, Tensorflow 2.12.0

davidADSP commented 1 year ago

OK it looks like it's wrong in the docs, but not in the examples code

davidADSP commented 1 year ago

Specifically, this page https://docs.ray.io/en/latest/rllib/rllib-concepts.html

ArturNiederfahrenhorst commented 1 year ago

Thanks for raising this. We are deprecating the API in question and our TF1 support alltogether. I'm sorry this could not be resolved. If possible, please use another framework. We can close this issue as soon as RLModules and Trainers have trickled through the docs.

ray-project / ray