Closed dajianer closed 5 months ago
The issue appears to be TD3's target policy smoothing, with the action noise wrapper not supporting the hybrid action space. Setting noise=False
in the policy configuration to be the same as the reference DDPG config disables target policy smoothing as a workaround.
在使用TD3训练混合动作空间环境时,运行会报错assert isinstance(action, torch.Tensor),我查看源码发现HybridArgmaxSampleWrapper的forward返回值确实可能会引起错误,请问我应该怎样解决呢 代码如下: