Open Ankur-Deka opened 4 years ago
Same question in PyTorch implemenation of VPG.And I find that tensorflow implementation of vpg is different with PyTorch.
This same issue has been raised and a solution proposed in pull request 'Fix value bootstrapping' #233
However, that PR suggests using v for timeout conditions too:
timeout = ep_len == max_ep_len
...
if d and not timeout:
v = 0
else:
_, v, _ = ac.step(torch.as_tensor(o, dtype=torch.float32))
I'd favor @Ankur-Deka 's solution since max_ep_len can be reached in non terminal states (unrelated to environment's dynamics).
In the Pytorch implementation of the algorithms, there is a slight issue with computing value of terminal state. Let's look at the example of PPO:
Line 321 in
ppo.py
:if timeout or epoch_ended: _,v,_ = ac.step(o) else: v = 0
I think the condition should rather be:
if not d
It might so happen that
d
andepoch_ended
occur at the same time. We would want to check whether it's the end of the episode when assigning v to terminal state.