jcwleo / awr-pytorch

Advantage-Weighted Regression
MIT License
10 stars 2 forks source link

done/dones ? #2

Closed MoMe36 closed 4 years ago

MoMe36 commented 4 years ago

Hi ! Thanks for the implementation,

In your discount function, you write:


def discount_return(reward, done, value):
    value = value.squeeze()
    num_step = len(value)
    discounted_return = np.zeros([num_step])

    gae = 0
    for t in range(num_step - 1, -1, -1):

        if dones[t]:
            delta = reward[t] - value[t]
        else:
            delta = reward[t] + gamma * value[t + 1] - value[t]
        gae = delta + gamma * lam * (1 - done[t]) * gae

        discounted_return[t] = gae + value[t]

Shouldn't the first condition be if done[t] ?

Thanks !

jcwleo commented 4 years ago

@MoMe36 Thank you. I fixed it.