pytorch / rl

A modular, primitive-first, python-first PyTorch library for Reinforcement Learning.
https://pytorch.org/rl
MIT License
2.25k stars 297 forks source link

[BUG]UserWarning: deterministic_sample wasn't found when queried in <class 'torch.distributions.categorical.Categorical'>. CompositeDistribution is falling back on mode instead. For better code quality and efficiency, make sure to either provide a distribution with a deterministic_sample attribute or to change the InteractionMode to the desired value. #2458

Open Sui-Xing opened 1 week ago

Sui-Xing commented 1 week ago

Describe the bug

When I proceeded further based on the issue https://github.com/pytorch/rl/issues/2402, I encountered this warning. I have pulled the GitHub projects torchrl and tensordict, and installed the latest libraries from both projects using python .\setup.py develop. I'm puzzled why it's not possible to perform deterministic_sample using Categorical within CompositeDistribution. Since CompositeDistribution can already compute log_prob, why can't it perform deterministic_sample?

To Reproduce

I have already used the code suggested in the answer of this issue: https://github.com/pytorch/rl/issues/2402

Expected behavior

I hope to eliminate this warning so that log_prob in CompositeDistribution can properly assist in performing deterministic_sample for discrete action spaces. Alternatively, could you tell me whether this warning has a negative impact on the convergence of my PPO model training process or the prediction process?

System info

win11 24h2 python 3.10.14

torch 2.4.0+cu118 torchaudio 2.4.0+cu118 torchrl 0.5.0+ca3a595 torchvision 0.19.0+cu118 tensordict 0.5.0+eba0769

Sui-Xing commented 1 week ago

I also encountered a new error.I cannot perform gradient computation correctly. Code according to the documentation https://pytorch.org/rl/stable/reference/generated/torchrl.objectives.ClipPPOLoss.html?highlight=clipppoloss https://pytorch.org/rl/stable/reference/generated/torchrl.modules.tensordict_module.ActorCriticOperator.html?highlight=actorcriticoperator. When using the two methods from the documentation to create a ClipPPO loss object, I encountered two different errors.

RuntimeError: tensordict stored sample_log_prob requires grad.

code:

loss_module = ClipPPOLoss(
    actor_network=ProbabilisticTensorDictSequential(a_c_model, actor),
    critic_network=value_module,
    clip_epsilon=clip_epsilon,
    entropy_bonus=bool(entropy_eps),
    entropy_coef=entropy_eps,
    critic_coef=1.0,
    loss_critic_type="smooth_l1",
    normalize_advantage=True,
    device=device
)

error message:

Traceback (most recent call last):
  File "D:\Work\project\autorouting\src\main\train.py", line 342, in <module>
    raise e
  File "D:\Work\project\autorouting\src\main\train.py", line 336, in <module>
    loss_vals = loss_module(subdata)
  File "C:\Users\Jupiter\tools\miniconda\envs\routing_latest\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Jupiter\tools\miniconda\envs\routing_latest\lib\site-packages\torch\nn\modules\module.py", line 1603, in _call_impl
    result = forward_call(*args, **kwargs)
  File "d:\work\project\git\rl\torchrl\objectives\common.py", line 49, in new_forward
    return func(self, *args, **kwargs)
  File "d:\work\project\git\tensordict\tensordict\nn\common.py", line 325, in wrapper
    return func(_self, tensordict, *args, **kwargs)
  File "d:\work\project\git\rl\torchrl\objectives\ppo.py", line 860, in forward
    log_weight, dist, kl_approx = self._log_weight(tensordict)
  File "d:\work\project\git\rl\torchrl\objectives\ppo.py", line 487, in _log_weight
    raise RuntimeError(
RuntimeError: tensordict stored sample_log_prob requires grad.

TypeError: unsupported operand type(s) for *: 'Tensor' and 'TensorDict'

code:

loss_module = ClipPPOLoss(
    actor_network=a_c_model.get_policy_operator(),
    critic_network=a_c_model.get_critic_operator(),
    clip_epsilon=clip_epsilon,
    entropy_bonus=bool(entropy_eps),
    entropy_coef=entropy_eps,
    critic_coef=1.0,
    loss_critic_type="smooth_l1",
    normalize_advantage=True,
    device=device
)

error message:

Traceback (most recent call last):
  File "D:\Work\project\autorouting\src\main\fanoutnet\train.py", line 336, in <module>
    loss_vals = loss_module(subdata)
  File "C:\Users\Jupiter\tools\miniconda\envs\autorouting_latest\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\Jupiter\tools\miniconda\envs\autorouting_latest\lib\site-packages\torch\nn\modules\module.py", line 1603, in _call_impl
    result = forward_call(*args, **kwargs)
  File "d:\work\project\git\rl\torchrl\objectives\common.py", line 49, in new_forward
    return func(self, *args, **kwargs)
  File "d:\work\project\git\tensordict\tensordict\nn\common.py", line 325, in wrapper
    return func(_self, tensordict, *args, **kwargs)
  File "d:\work\project\git\rl\torchrl\objectives\ppo.py", line 885, in forward
    td_out.set("loss_entropy", -self.entropy_coef * entropy)
TypeError: unsupported operand type(s) for *: 'Tensor' and 'TensorDict'

I also have this error when I use actor and value in ClipPPOLoss without shared network