zhangky12 / TSP_RL_Q

3 stars 0 forks source link

bug for TSP_Q-learning.ipynb #1

Open jackycaf opened 2 years ago

jackycaf commented 2 years ago

正在学习您的TSP教程,遇到一个小问题,请您帮忙: 代码块1: for idx in range(K):

Train with Q-learning

rewards, Q_q = train(env, qpolicy, total_episodes)

rewards, Q_q = train(env, on_policy, total_episodes)

qlearning_rewards[idx,:] = rewards 代码块2: plt.plot(np.mean(qlearning_rewards[:,10:], 0).T) plt.legend(['Q-learning']) plt.xlabel('Episodes') plt.ylabel('Mean reward');

plt.ylim([-200,0]);

代码块3: path = '0' dist = 0 for i in range(len(CG_matrix)): next_step = str(np.argmax(Q_q[env.obs_encode[path]])) path += next_step

path = [int(i) for i in path] path

上述两部分的代码好像有点问题:

  1. qlearning_rewards跑出来的结果与您的图不一致。
  2. 代码块的Q_q在循环中,而代码块直接使用Q_q,提示key-error,不知道是不是维度问题
jackycaf commented 2 years ago

更正:代码块1的Q_q在循环中,而代码块3直接使用循环内的Q_q,提示key-error,不知道是不是维度问题