linnaeushuang / pensieve-pytorch

MIT License
27 stars 10 forks source link

How did A3C/'v_batch' define? #6

Closed lesleychou closed 3 years ago

lesleychou commented 4 years ago

Hi, I am wondering about the A3C.py, at line 53: R_batch[-1] = v_batch[-1]

The v_batch (I assumed it's value_batch) is defined as: v_batch=self.criticNetwork.forward(s_batch).squeeze().to(self.device) later.

How will the if part work then?

if terminal:
            pass
        else:
            R_batch[-1] = v_batch[-1]

Many thanks to your torch code update:)

linnaeushuang commented 4 years ago

sorry,this code is a typo. I wrote this code in reference to hongzimao/pensieve/sim/a3c.py,line232-235.

if terminal:
    R_batch[-1,0] = 0
else:
    R_batch[-1,0] = v_batch[-1,0]

because R_batch is initialized to zeros tensor,I write pass in first condition.

you can see pensieve-pytorch.py line261.when len(r_batch)>=TRAIN_SEQ_LEN and video is not finished,end_of_video is false.After putting the exp to central_agent,terminal is false at line138.it will casuse to R_batch[-1] = v_batch[-1].

mao's code mean the reward for the last step at the end of the video is 0. There was some confusion,but I didn't realize it at the time.

now,I think the follwing code is correct.

if terminal or not terminal:
    R_batch[-1] = r_batch[-1]

thanks for asking.

lesleychou commented 4 years ago

wow thanks for your quick reply! it makes sense to me now^^