Closed hai-h-nguyen closed 2 years ago
For SACD, can you explain why you do this https://github.com/twni2016/pomdp-baselines/blob/main/policies/models/policy_rnn.py#L235 instead of this (which I think is correct)? https://github.com/twni2016/pomdp-baselines/blob/main/policies/models/policy_rnn.py#L230
Because the target value V(s') = E_{a'~ \pi(s')}{Q(s',a')}. I think there is no bug?
Yeah, it's not a bug. Therefore, I closed the issue. Thanks for replying.
For SACD, can you explain why you do this https://github.com/twni2016/pomdp-baselines/blob/main/policies/models/policy_rnn.py#L235 instead of this (which I think is correct)? https://github.com/twni2016/pomdp-baselines/blob/main/policies/models/policy_rnn.py#L230