todo: add previous action as a model input (onehot)

mobeets / q-rnn

0 stars 0 forks source link

Closed mobeets closed 1 year ago

mobeets commented 1 year ago

Note that beliefs depend on the previous action (and not actually the previous reward). So we should add this into the model, as a one-hot vector.

p_obs = O[x[0]+1,:]
p_tra = (T[:,:,a_prev] @ b_prev)
b = p_obs * p_tra
b = b/b.sum()