Closed lesleychou closed 3 years ago
sorry,this code is a typo.
I wrote this code in reference to hongzimao/pensieve/sim/a3c.py
,line232-235.
if terminal:
R_batch[-1,0] = 0
else:
R_batch[-1,0] = v_batch[-1,0]
because R_batch is initialized to zeros tensor,I write pass
in first condition.
you can see pensieve-pytorch.py line261.when len(r_batch)>=TRAIN_SEQ_LEN
and video is not finished,end_of_video
is false.After putting the exp to central_agent,terminal
is false at line138.it will casuse to R_batch[-1] = v_batch[-1]
.
mao's code mean the reward for the last step at the end of the video is 0. There was some confusion,but I didn't realize it at the time.
now,I think the follwing code is correct.
if terminal or not terminal:
R_batch[-1] = r_batch[-1]
thanks for asking.
wow thanks for your quick reply! it makes sense to me now^^
Hi, I am wondering about the A3C.py, at line 53:
R_batch[-1] = v_batch[-1]
The v_batch (I assumed it's value_batch) is defined as:
v_batch=self.criticNetwork.forward(s_batch).squeeze().to(self.device)
later.How will the if part work then?
Many thanks to your torch code update:)