Closed edcxan closed 1 year ago
Hi @edcxan, these are all valid points. Getting the RLModules / Learner stack is high priority for the RLlib team. Can you put up a PR with the items? Ideally, if you have a short script that creates errors without your changes but does not create errors with your changes, that would be great and accelerate the process. Do you think that would be possible?
Thanks for raising this issue in any case!
@edcxan, thanks for opening this issue. This is a good one :) The broader take here should be, ioo:
algorithms.dreamerv3.utils.env_runner.py
What happened + What you expected to happen
Currently there are several bugs/incomplete code within RL module preventing it from working properly with multi-agent, multi-policy PPO (and possibly other algorithms.) Some of them I have managed to patch and they are:
policy = self.get_policy(policy_id)
in the learner api block conflicting with policy and policy_cls (can only have 1) when adding to evaluation workers.if self.config._enable_learner_api:
self.learner_group.remove_module(
module_id=policy_id,
)
self.curr_kl_coeffs_per_module.pop(module_id, None)
self.entropy_coeff_schedulers_per_module.pop(module_id, None)
assert sampled_kl_values, "Sampled KL values are empty."
needs to be moved under aif hps.use_kl_loss:
since they will not exist if not using KL loss.Versions / Dependencies
ray==2.6.3
Reproduction script
Using any of the functions described with RL Module
Issue Severity
Low: It annoys or frustrates me.