Closed mobeets closed 1 year ago
Note that beliefs depend on the previous action (and not actually the previous reward). So we should add this into the model, as a one-hot vector.
p_obs = O[x[0]+1,:] p_tra = (T[:,:,a_prev] @ b_prev) b = p_obs * p_tra b = b/b.sum()
Note that beliefs depend on the previous action (and not actually the previous reward). So we should add this into the model, as a one-hot vector.