Open jackycaf opened 2 years ago
正在学习您的TSP教程,遇到一个小问题,请您帮忙: 代码块1: for idx in range(K):
rewards, Q_q = train(env, qpolicy, total_episodes)
qlearning_rewards[idx,:] = rewards 代码块2: plt.plot(np.mean(qlearning_rewards[:,10:], 0).T) plt.legend(['Q-learning']) plt.xlabel('Episodes') plt.ylabel('Mean reward');
代码块3: path = '0' dist = 0 for i in range(len(CG_matrix)): next_step = str(np.argmax(Q_q[env.obs_encode[path]])) path += next_step
path = [int(i) for i in path] path
上述两部分的代码好像有点问题:
更正:代码块1的Q_q在循环中,而代码块3直接使用循环内的Q_q,提示key-error,不知道是不是维度问题
正在学习您的TSP教程,遇到一个小问题,请您帮忙: 代码块1: for idx in range(K):
Train with Q-learning
rewards, Q_q = train(env, qpolicy, total_episodes)
rewards, Q_q = train(env, on_policy, total_episodes)
qlearning_rewards[idx,:] = rewards 代码块2: plt.plot(np.mean(qlearning_rewards[:,10:], 0).T) plt.legend(['Q-learning']) plt.xlabel('Episodes') plt.ylabel('Mean reward');
plt.ylim([-200,0]);
代码块3: path = '0' dist = 0 for i in range(len(CG_matrix)): next_step = str(np.argmax(Q_q[env.obs_encode[path]])) path += next_step
path = [int(i) for i in path] path
上述两部分的代码好像有点问题: