Closed Quinticx closed 10 months ago
Thanks for reporting, I am looking into this
Could you try wrapping the render function in torch.no_grad()
like below and tell me if the problem still occurs?
with torch.no_grad():
env.rollout(
max_steps=max_steps,
policy=policy,
callback=lambda env, _: env.render(),
auto_cast_to_device=True,
break_when_any_done=False,
)
Thanks for reporting this @Quinticx!
As @matteobettini suggested, a no_grad
could solve things.
To give a bit of context, gradient propagation in rollouts (and in RL in general) is a tough decision to make.
For instance, we could disable gradients for all rollout
calls but that would be a terrible decision for meta-rl, inverse-rl, trajectory optimization and such. We could add one more kwarg in rollout, but the advantage compared to explicitly putting your call under a no_grad
decorator would be marginal IMO.
We have similar issues with value computation (when and how disable graph construction) and many other places in the code.
There isn't a single answer I'm afraid. What we could do better is capturing errors related to that.
As always, suggestions are welcome!
Describe the bug
Following the tutorial on Multi-agent reinforcement learning using PPO, when attempting to render the rollout after training, it fails to do so.
To Reproduce
I implemented the code from the MARL PPO tutorial (https://pytorch.org/rl/tutorials/multiagent_ppo.html#). The issue/bug is only encountered when attempting to render the environment after training the policy. If the
policy=policy
line is commented it runs smoothly, but has no information regarding the policy and thus is random.Reason and Possible fixes
The traceback links to the following issue in PyTorch: https://github.com/pytorch/pytorch/pull/103001
System info
Describe the characteristics of your environment:
Checklist