Open norikazu99 opened 1 year ago
Confirmed on current master branch; will look into it. @norikazu99 could you provide any other details about your system's resources?
Thank you for bringing this up.
The *learner
arguments you passed in are meant for the new RLlib Learner API, which you can enable using config.training(_enable_learner_api=True).rl_module(_enable_rl_module_api=True)
. Note that this will ignore the num_gpus
argument in favor of num_gpus_per_learner_worker
.
I've confirmed the repro script doesn't error out with this change, let us know if it works on your end.
We'll try to separate and document the two types of arguments better, thanks for raising this issue! #35671
Hello @Rohan138 thanks for your help. The workers don't seem to crash anymore with the new learner api but I do get the following warnings. The second one seems more alarming and told me to contact the ray team.
Warning 1:
WARNING algorithm_config.py:2412 -- Setting
exploration_config={}
because you set_enable_rl_modules=True
. When RLModule API are enabled, exploration_config can not be set. If you want to implement custom exploration behaviour, please modify theforward_exploration
method of the RLModule at hand. On configs that have a default exploration config, this must be done withconfig.exploration_config={}
.
Warning 2:
2023-05-24 04:03:59,915 INFO pbt.py:808 --
[PB2] [Exploit] Cloning trial 393b6_00001 (score = 27.522034) into trial 393b6_00002 (score = 23.624642)
2023-05-24 04:03:59,930 INFO pbt.py:835 --
[PB2] [Explore] Perturbed the hyperparameter config of trial393b6_00002: gamma : 0.9727699438053676 -----> 0.9727699438053676 lr : 0.0006982559357067231 -----> 0.0006982559357067231
2023-05-24 04:03:59,962 WARNING trial_runner.py:1543 -- You are trying to access pause_trial interface of TrialRunner in TrialScheduler, which is being restricted. If you believe it is reasonable for your scheduler to access this TrialRunner API, please reach out to Ray team on GitHub. A more strict API access pattern would be enforced starting 1.12s.0 (PPO pid=34676) 2023-05-24 04:04:00,024 WARNING policy.py:134 -- Can not figure out a durable policy name for <class 'ray.rllib.algorithms.ppo.torch.ppo_torch_policy_rlm.PPOTorchPolicyWithRLModule'>. You are probably trying to checkpoint a custom policy. Raw policy class may cause problems when the checkpoint needs to be loaded in the future. To fix this, make sure you add your custom policy in rllib.algorithms.registry.POLICIES. Result for PPO_CartPole-v1_393b6_00002:
@norikazu99 If you are using the new stack, I want to make sure that you do not specify num_gpus
in your config. I will have someone else from the ray team help you out with tune warning. The other warnings are fine and part of the artifact of the new experimental stack that we are slowly rolling out.
@kouroshHakha Yes I noticed that the code would only work when I would not use num_gpus and local_gpu_idx. However, it seems to only work for "CartPole-v1". I get the following error when enabling the learner api with my custom env (action is a multidiscrete). Env runs just fine without pb2 and learner api if you were wondering.
(PPO pid=58072) File "python\ray_raylet.pyx", line 881, in ray._raylet.execute_task (PPO pid=58072) File "python\ray_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray_private\function_manager.py", line 670, in actor_method_executor (PPO pid=58072) return method(__ray_actor, *args, kwargs) (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span (PPO pid=58072) return method(self, *_args, *_kwargs) (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 738, in init (PPO pid=58072) self._update_policy_map(policy_dict=self.policy_dict) (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span (PPO pid=58072) return method(self, _args, _kwargs) (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map (PPO pid=58072) self._build_policy_map( (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span (PPO pid=58072) return method(self, *_args, *_kwargs) (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map (PPO pid=58072) new_policy = create_policy_for_framework( (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\utils\policy.py", line 142, in create_policy_for_framework (PPO pid=58072) return policy_class(observation_space, action_space, merged_config) (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\algorithms\ppo\torch\ppo_torch_policy_rlm.py", line 82, in init (PPO pid=58072) self._initialize_loss_from_dummy_batch() (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\policy\policy.py", line 1405, in _initialize_loss_from_dummy_batch (PPO pid=58072) actions, state_outs, extra_outs = self.compute_actions_from_input_dict( (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 522, in compute_actions_from_input_dict (PPO pid=58072) return self._compute_action_helper( (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\utils\threading.py", line 32, in wrapper (PPO pid=58072) raise e (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\utils\threading.py", line 24, in wrapper (PPO pid=58072) return func(self, a, **k) (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\policy\torch_policy_v2.py", line 1110, in _compute_action_helper (PPO pid=58072) logp = action_dist.logp(actions) (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\models\torch\torch_distributions.py", line 324, in logp (PPO pid=58072) logps = torch.stack([cat.log_prob(act) for cat, act in zip(self._cats, value)]) (PPO pid=58072) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\models\torch\torch_distributions.py", line 324, in
(PPO pid=58072) logps = torch.stack([cat.log_prob(act) for cat, act in zip(self._cats, value)]) (PPO pid=58072) AttributeError: 'TorchCategorical' object has no attribute 'log_prob'
The only reason I'm using the learner api is to use pb2 and and as suggested above by @Rohan138, it seems to be the only way to do so using a gpu. I don't mind using or not using it as long as I'm able to use pb2 for my custom env. Thanks for your help.
When using framework="tf2", I get the following error which seems to be happening when dealing with the action distributions too.
File "python\ray_raylet.pyx", line 821, in ray._raylet.execute_task.function_executor File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray_private\function_manager.py", line 670, in actor_method_executor return method(__ray_actor, *args, kwargs) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span return method(self, *_args, *_kwargs) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 738, in init self._update_policy_map(policy_dict=self.policy_dict) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span return method(self, _args, _kwargs) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 1985, in _update_policy_map self._build_policy_map( File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\util\tracing\tracing_helper.py", line 460, in _resume_span return method(self, *_args, **_kwargs) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\evaluation\rollout_worker.py", line 2097, in _build_policy_map new_policy = create_policy_for_framework( File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\utils\policy.py", line 139, in create_policy_for_framework return policy_class(observation_space, action_space, merged_config) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\algorithms\ppo\tf\ppo_tf_policy_rlm.py", line 74, in init self.maybe_initialize_optimizer_and_loss() File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\policy\eager_tf_policy_v2.py", line 444, in maybe_initialize_optimizer_and_loss self._initialize_loss_from_dummy_batch( File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\policy\policy.py", line 1485, in _initialize_loss_from_dummy_batch self.loss(self.model, self.dist_class, train_batch) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\algorithms\ppo\tf\ppo_tf_policy_rlm.py", line 101, in loss action_kl = prev_action_dist.kl(curr_action_dist) File "C:\Users\badrr\anaconda3\envs\yeah\lib\site-packages\ray\rllib\models\tf\tf_distributions.py", line 345, in kl [cat.kl(oth_cat) for cat, oth_cat in zip(self._cats, other.cats)], axis=1 AttributeError: '<class 'ray.rllib.models.tf.tf_distributions.TfMul' object has no attribute 'cats'
What happened + What you expected to happen
Error caused due to code expecting scalars on cpu when perturbation is done for PB2 schedule with gpu learner.
Code to reproduce:
Error:
Versions / Dependencies
Reproduction script
Issue Severity
High: It blocks me from completing my task.