Closed Alexzzdfjcn closed 3 years ago
Hi, Please provide more details about the code you used. Did you take multiple rounds of training with the same run? If so, using exactly the same plt.figure will resulting in multiple curves on the same plot.
`def train(): env = gym.make(ENV_NAME).unwrapped state_dim = env.observation_space.shape[0] action_dim = env.action_space.shape[0] drawer = Drawer()
#env.seed(RANDOMSEED)
#np.random.seed(RANDOMSEED)
#torch.manual_seed(RANDOMSEED)
ppo = PPO(state_dim, action_dim, method = METHOD)
global all_ep_r, update_plot, stop_plot, Angle, OPT_ANGLE
all_ep_r = []
Angle = []
OPT_ANGLE = []
for ep in range(EP_MAX):
s = env.reset()
ep_r = 0
t0 = time.time()
for t in range(EP_LEN):
if RENDER:
env.render()
a = ppo.choose_action(s)
ti = time.time()
s_, S_temp, r, done, _ = env.step(s,a,ti) #px
ppo.store_transition(s, a, (r + 8) / 8) # useful for pendulum since the nets are very small,
#normalization make it easier to learn
s = s_
ep_r += r
angle, speed, height = s #px
# update ppo
if len(ppo.state_buffer) == BATCH_SIZE:
ppo.finish_path(s_, done)
ppo.update()
if done:
break
ppo.finish_path(s_, done)
print(
'Episode: {}/{} | Episode Reward: {:.4f} | Running Time: {:.4f}'.format(
ep + 1, EP_MAX, ep_r,
time.time() - t0
)
)
if ep == 0:
all_ep_r.append(ep_r)
else:
all_ep_r.append(all_ep_r[-1] * 0.9 + ep_r * 0.1)
OPT_ANGLE.append(S_temp) #px
Angle.append(angle) #px
if PLOT_RESULT:
update_plot.set()
ppo.save_model()
if PLOT_RESULT:
stop_plot.set()
env.close()`
After I annotated lines 7 to 9, I found that the curve of each training is no longer the same. Is it correct for me to modify it like this,please?
I'm afraid that the problem is not caused by the code you adopted from this repo. And I'm not clear what plotting function did you use, and which environment your are working on.
I guess you mean s = env.reset()
for line 7, the environment reset is standard in RL and should not be removed in general. Maybe you are using a very deterministic environment without any noise, then probably the learning curve will show up to be the same if the model uses exactly the same samples for update during the whole learning process. But it looks to me this is less likely to happen because in choose_action
there is some randomness in sampling. So I would say check more of the plotting code you used.
`class Drawer: def init(self, comments=''): global update_plot, stop_plot update_plot = threading.Event() update_plot.set() stop_plot = threading.Event() stop_plot.clear() self.title = ARGNAME if comments: self.title += '' + comments
def plot(self):
plt.ion()
clear_output(True) #px1013
global all_ep_r, update_plot, stop_plot, Angle, OPT_ANGLE
all_ep_r = []
Angle = []
OPT_ANGLE = []
while not stop_plot.is_set():
if update_plot.is_set():
plt.figure(num=1,figsize=(20,5))
plt.cla()
plt.title('Reward') #px
plt.plot(all_ep_r)
# plt.ylim(-2000, 0)
plt.xlabel('Episode')
plt.ylabel('Moving averaged episode reward')
plt.savefig(os.path.join('fig','Morphing reward_' + time_str))
plt.figure(num=2,figsize=(20,5))
plt.cla()
plt.title('Angle')
x=list(range(0,len(Angle)))
plot1 = plt.plot(Angle, 'r-', label = 'angle')
plot2 = plt.plot(OPT_ANGLE, 'b--', label = 'opt_angle')
# plt.ylim(-2000, 0)
plt.xlabel('Episode')
plt.ylabel('Morphing Angle')
plt.savefig(os.path.join('fig','Morphing Angle_' + time_str))
plt.legend()
update_plot.clear()
#px
plt.draw()
plt.pause(0.1)
plt.ioff()
plt.close()`
This is the drawing code I used. I don't think there's anything strange. Could you please help me have a look?
I think the problem lies in this code
env.seed(RANDOMSEED) np.random.seed(RANDOMSEED) torch.manual_seed(RANDOMSEED)
Because the random number seed is used, the random number generated by the system is the same every time, so the action selection of each step is the same. I don't know if I am right?
If it's like you said, you can simply verify it by using different random seeds.
Ok,i will try it. Thanks for your help!