Open Sui-Xing opened 1 week ago
I also encountered a new error.I cannot perform gradient computation correctly. Code according to the documentation https://pytorch.org/rl/stable/reference/generated/torchrl.objectives.ClipPPOLoss.html?highlight=clipppoloss https://pytorch.org/rl/stable/reference/generated/torchrl.modules.tensordict_module.ActorCriticOperator.html?highlight=actorcriticoperator. When using the two methods from the documentation to create a ClipPPO loss object, I encountered two different errors.
loss_module = ClipPPOLoss(
actor_network=ProbabilisticTensorDictSequential(a_c_model, actor),
critic_network=value_module,
clip_epsilon=clip_epsilon,
entropy_bonus=bool(entropy_eps),
entropy_coef=entropy_eps,
critic_coef=1.0,
loss_critic_type="smooth_l1",
normalize_advantage=True,
device=device
)
Traceback (most recent call last):
File "D:\Work\project\autorouting\src\main\train.py", line 342, in <module>
raise e
File "D:\Work\project\autorouting\src\main\train.py", line 336, in <module>
loss_vals = loss_module(subdata)
File "C:\Users\Jupiter\tools\miniconda\envs\routing_latest\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\Jupiter\tools\miniconda\envs\routing_latest\lib\site-packages\torch\nn\modules\module.py", line 1603, in _call_impl
result = forward_call(*args, **kwargs)
File "d:\work\project\git\rl\torchrl\objectives\common.py", line 49, in new_forward
return func(self, *args, **kwargs)
File "d:\work\project\git\tensordict\tensordict\nn\common.py", line 325, in wrapper
return func(_self, tensordict, *args, **kwargs)
File "d:\work\project\git\rl\torchrl\objectives\ppo.py", line 860, in forward
log_weight, dist, kl_approx = self._log_weight(tensordict)
File "d:\work\project\git\rl\torchrl\objectives\ppo.py", line 487, in _log_weight
raise RuntimeError(
RuntimeError: tensordict stored sample_log_prob requires grad.
loss_module = ClipPPOLoss(
actor_network=a_c_model.get_policy_operator(),
critic_network=a_c_model.get_critic_operator(),
clip_epsilon=clip_epsilon,
entropy_bonus=bool(entropy_eps),
entropy_coef=entropy_eps,
critic_coef=1.0,
loss_critic_type="smooth_l1",
normalize_advantage=True,
device=device
)
Traceback (most recent call last):
File "D:\Work\project\autorouting\src\main\fanoutnet\train.py", line 336, in <module>
loss_vals = loss_module(subdata)
File "C:\Users\Jupiter\tools\miniconda\envs\autorouting_latest\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\Users\Jupiter\tools\miniconda\envs\autorouting_latest\lib\site-packages\torch\nn\modules\module.py", line 1603, in _call_impl
result = forward_call(*args, **kwargs)
File "d:\work\project\git\rl\torchrl\objectives\common.py", line 49, in new_forward
return func(self, *args, **kwargs)
File "d:\work\project\git\tensordict\tensordict\nn\common.py", line 325, in wrapper
return func(_self, tensordict, *args, **kwargs)
File "d:\work\project\git\rl\torchrl\objectives\ppo.py", line 885, in forward
td_out.set("loss_entropy", -self.entropy_coef * entropy)
TypeError: unsupported operand type(s) for *: 'Tensor' and 'TensorDict'
I also have this error when I use actor and value in ClipPPOLoss without shared network
Describe the bug
When I proceeded further based on the issue https://github.com/pytorch/rl/issues/2402, I encountered this warning. I have pulled the GitHub projects torchrl and tensordict, and installed the latest libraries from both projects using
python .\setup.py develop
. I'm puzzled why it's not possible to performdeterministic_sample
usingCategorical
withinCompositeDistribution
. SinceCompositeDistribution
can already computelog_prob
, why can't it performdeterministic_sample
?To Reproduce
I have already used the code suggested in the answer of this issue: https://github.com/pytorch/rl/issues/2402
Expected behavior
I hope to eliminate this warning so that
log_prob
inCompositeDistribution
can properly assist in performingdeterministic_sample
for discrete action spaces. Alternatively, could you tell me whether this warning has a negative impact on the convergence of my PPO model training process or the prediction process?System info
win11 24h2 python 3.10.14
torch 2.4.0+cu118 torchaudio 2.4.0+cu118 torchrl 0.5.0+ca3a595 torchvision 0.19.0+cu118 tensordict 0.5.0+eba0769