ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
https://ray.io
Apache License 2.0
34.26k stars 5.81k forks source link

[RLlib] Issues with RLModules/Learner + evaluation workers and not using KL loss #39174

Closed edcxan closed 1 year ago

edcxan commented 1 year ago

What happened + What you expected to happen

Currently there are several bugs/incomplete code within RL module preventing it from working properly with multi-agent, multi-policy PPO (and possibly other algorithms.) Some of them I have managed to patch and they are:

  1. In algorithm.py's add_policy, the learner api block needs to be moved under the evaluation workers block otherwise add_policy will always fail if evaluation workers exist, due topolicy = self.get_policy(policy_id) in the learner api block conflicting with policy and policy_cls (can only have 1) when adding to evaluation workers.
  2. In algorithm.py's remove_policy, we need to add a block to remove modules from the learner group as well:

if self.config._enable_learner_api: self.learner_group.remove_module( module_id=policy_id, )

  1. In ppo_learner.py's remove_module, the two coeff .pop's can fail if they do not exist, for example if not using entropy coeff scheduler. Should return None if the keys do not exist:

self.curr_kl_coeffs_per_module.pop(module_id, None) self.entropy_coeff_schedulers_per_module.pop(module_id, None)

  1. In ppo_torch_learner.py's additional_update_for_module, assert sampled_kl_values, "Sampled KL values are empty." needs to be moved under a if hps.use_kl_loss: since they will not exist if not using KL loss.

Versions / Dependencies

ray==2.6.3

Reproduction script

Using any of the functions described with RL Module

Issue Severity

Low: It annoys or frustrates me.

ArturNiederfahrenhorst commented 1 year ago

Hi @edcxan, these are all valid points. Getting the RLModules / Learner stack is high priority for the RLlib team. Can you put up a PR with the items? Ideally, if you have a short script that creates errors without your changes but does not create errors with your changes, that would be great and accelerate the process. Do you think that would be possible?

Thanks for raising this issue in any case!

simonsays1980 commented 1 year ago

@edcxan, thanks for opening this issue. This is a good one :) The broader take here should be, ioo: