Open nyielding opened 2 years ago
can you clarify what is the use case here? why are we running an RLlib stack without training any policy?
can you clarify what is the use case here? why are we running an RLlib stack without training any policy?
I'm training agents to do guidance in a GNC environment, and I want to have both trained policies and custom policies that provide heuristic guidance actions via traditional control methods for comparison. It is a multiagent cooperative environment, so for baselines and comparisons I don't want to always mix trained and heuristic policies in the same episodes.
So I would like to be able to run short experiments with the exact same parameters as my training experiment, but with the heuristic 'dummy' policies, so I can get 1:1 results comparison from all my custom metrics and recording callbacks I have implemented.
oh, if I understand correctly, is this a use case of "evaluate"? supposedly, we provide rllib/evaluate.py which would do rollout using trained and whatever policies you want. I am not sure if that covers your use case.
oh, if I understand correctly, is this a use case of "evaluate"? supposedly, we provide rllib/evaluate.py which would do rollout using trained and whatever policies you want. I am not sure if that covers your use case.
I suppose you could break it out into an 'evaluate' use case. I have seen the evaluate.py file but admittedly I haven't taken the time/effort to extend it to work with my experiment setup. The way we build up large config files and custom environments doesn't fit neatly and easily into that script, as opposed to just calling tune.run again with a different policy loadout. Building out proper evaluation scripts for my experiments is on my backlog.
But regardless, I think the current behavior is not intended, and the workaround I detailed seems to produce what I would think is the intended behavior. The workaround was not obvious though, if anyone else tries this and runs into the issue, but I can fall back on using it for now.
ok, thanks for the clarification. we will keep this in our backlog of things to clean up. agree that the workaround is not good.
Search before asking
Ray Component
RLlib
Issue Severity
Medium: It contributes to significant difficulty to complete my task but I work arounds and get it resolved.
What happened + What you expected to happen
config.multiagent.policies_to_train excepts and returns an incorrect/confusing error message when passed an empty list while training with PPO.
In the instance of training where all policies are random/heuristic/etc and none are meant to be trained, according to config docs an empty list should be passed for "policies_to_train". In this case the error message returned is:
but intuitively if "this policy is not meant to learn at all" you should NOT add it to the list under
config.multiagent.policies_to_train
.'The workaround is to pass a list containing a string that is anything that does not match a policy name ie if your policy is called
'random'
then passconfig.multiagent.policies_to_train = ['any_str_but_random']
and the code seems to run as intended.This could be the empty list being treated the same as None (which defaults to training all policies) but I haven't traced it to that.
Similar to previous closed issue https://github.com/ray-project/ray/issues/21044
Versions / Dependencies
ray 1.11.0 python 3.8.10 ubuntu 20.04 LTS
Reproduction script
Anything else
No response
Are you willing to submit a PR?