Closed Unimax closed 5 years ago
@Unimax How do you get the percentage of levels solved per time stamp as shown in the paper for training as well as testing. and also how do i get the average reward values per time stamp
@sanjayyyyyy
the avg reward from --train-eval or --test comands can be multiplied with 10 to get % of level solved.
for example use command for train:-
python -m coinrun.train_agent --run-id myrun --num-levels 500 --set-seed 13
and use this for test:-
python -m coinrun.enjoy --train-eval --restore-id myrun -num-eval 8 -rep 500
and multiply the final output of avg. reward which is out of 10 by 10.
i wrote my own testing function for pytorch based model which looks something like this:
def test(config):
"""Test routine"""
env = utils.Scalarize(make('standard', num_envs=1))
agent = DQN(env.observation_space.shape, env.action_space.n)
if config.enable_gpu and torch.cuda.is_available():
agent = agent.cuda()
bestmodel_file = os.path.join(config.save_dir, config.model_filename)
load_res = torch.load(bestmodel_file, map_location="cpu")
agent.load_state_dict(load_res["model"])
agent.eval()
success = 0
total_steps = 0
for i in range(config.testing_level_count):
state = env.reset()
ep_reward = 0
ep_length = 0
while True:
if config.render_play:
env.render()
state = torch.unsqueeze(torch.FloatTensor(state), 0)
action = torch.max(agent.forward(state), 1)[1].data.numpy()[0]
next_state, reward, done, info = env.step(action)
ep_length += 1
ep_reward += reward
state = copy.copy(next_state)
if done:
print("episode: {} , the episode reward : {} with length : {}".format(i, ep_reward, ep_length))
break
# state = next_state
if ep_reward > 0:
success = success + 1
total_steps += ep_length
print("Testing result : {} % completed. Avg. ep length : {}".format(success*100/config.testing_level_count , total_steps / config.testing_level_count))
env.close()
=========================== and avg reward per time stamp is (total avg. reward / avg length of episodes ) total avg. reward will be between 0-10 and avg length will be less then 250.
i personally using custom reward system like -1 for each time stamp -500 for deaths and +500 for coin etc.
Note: i am not a contributer on this repo. i am using coinrun for my project to apply DQN on it. here is my non perfect code:https://github.com/Unimax/coinrun-dqn-pytorch
@sanjayyyyyy the avg reward from --train-eval or --test comands can be multiplied with 10 to get % of level solved. for example use command for train:-
python -m coinrun.train_agent --run-id myrun --num-levels 500 --set-seed 13
and use this for test:-python -m coinrun.enjoy --train-eval --restore-id myrun -num-eval 8 -rep 500
and multiply the final output of avg. reward which is out of 10 by 10.i wrote my own testing function for pytorch based model which looks something like this:
def test(config): """Test routine""" env = utils.Scalarize(make('standard', num_envs=1)) agent = DQN(env.observation_space.shape, env.action_space.n) if config.enable_gpu and torch.cuda.is_available(): agent = agent.cuda() bestmodel_file = os.path.join(config.save_dir, config.model_filename) load_res = torch.load(bestmodel_file, map_location="cpu") agent.load_state_dict(load_res["model"]) agent.eval() success = 0 total_steps = 0 for i in range(config.testing_level_count): state = env.reset() ep_reward = 0 ep_length = 0 while True: if config.render_play: env.render() state = torch.unsqueeze(torch.FloatTensor(state), 0) action = torch.max(agent.forward(state), 1)[1].data.numpy()[0] next_state, reward, done, info = env.step(action) ep_length += 1 ep_reward += reward state = copy.copy(next_state) if done: print("episode: {} , the episode reward : {} with length : {}".format(i, ep_reward, ep_length)) break # state = next_state if ep_reward > 0: success = success + 1 total_steps += ep_length print("Testing result : {} % completed. Avg. ep length : {}".format(success*100/config.testing_level_count , total_steps / config.testing_level_count)) env.close()
=========================== and avg reward per time stamp is (total avg. reward / avg length of episodes ) total avg. reward will be between 0-10 and avg length will be less then 250.
i personally using custom reward system like -1 for each time stamp -500 for deaths and +500 for coin etc.
Note: i am not a contributer on this repo. i am using coinrun for my project to apply DQN on it. here is my non perfect code:https://github.com/Unimax/coinrun-dqn-pytorch
I saw your work in your repository, I'm trying to implement attention DRQN. Do you think it is a good idea?
well till now i did DQN and DDQN up to 1000 episodes and not able to get a decent agent for 5 random levels except for once (rare case). Most of my graphs are for one fixed level until now. even I am planning to connect an LSTM in my Network next week and see the changes. and see long training impact on multi-level training using DQN variants.
I would like to see the results of DRQN. I surely think it should help and we can even disable the velocity panted on the images, in that case. But i think it would require long training to get good results for multiple level case (for example paper did training for 500 levels). I had limited time to submit the project and hence did not tried too long training and too many other cases till now.
Also, I am No expert :) just finished my first term and learned a bit of Deep Learning.
Please help me understand why the previous state is always equal to the next state ? if thats the case how will any NN will work on state.
Output: