ray-project / ray

Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
32k stars 5.45k forks source link

[RLlib]: SimpleQ TF2 is broken #26192

Open Rohan138 opened 2 years ago

Rohan138 commented 2 years ago

What happened + What you expected to happen

Something's broken with the SimpleQ TF2 action distribution, but I can't track down the bug. This doesn't happen with TF1/Torch.

Versions / Dependencies

ray 3.0.0dev0 (master) Ubuntu 20.04 tensorflow 2.7.0 (pypi)

Reproduction script

from ray.rllib.algorithms.simple_q import SimpleQConfig

config = (
    SimpleQConfig()
    .environment(env="CartPole-v0")
    .rollouts(num_rollout_workers=0)
    .framework("tf2")
    .exploration(exploration_config={"type": "SoftQ", "temperature": 1.0})
)

algo = config.build()
policy = algo.get_policy()
batch = algo.workers.local_worker().sample()
log_likelihoods = policy.compute_log_likelihoods(batch["actions"], batch["obs"])

Issue Severity

Medium: It is a significant difficulty but I can work around it.

kouroshHakha commented 1 year ago

@Rohan138 This seems to be relevant to the policy_v1 vs v2 migration that we saw the other day. I'll leave you assigned for now. For some reason, we don't test compute_log_likelihood methods at all.