philtabor / ProtoRL

A Torch Based RL Framework for Rapid Prototyping of Research Papers
MIT License
42 stars 2 forks source link

seems like dqn use_dueling = True is broken #4

Open jt70 opened 9 months ago

jt70 commented 9 months ago

I changed the parameter in examples/dqn.py to this and I get an error:

def main():
    env_name = 'CartPole-v1'
    # env_name = 'PongNoFrameskip-v4'
    use_prioritization = True
    use_double = False
    use_dueling = True
    # use_dueling = False
    # use_atari = True
    use_atari = False
Traceback (most recent call last):
  File "/home/jason/github_projects/ProtoRL/protorl/examples/dqn.py", line 41, in <module>
    main()

File "/home/jason/github_projects/ProtoRL/protorl/examples/dqn.py", line 37, in main
    scores, steps_array = ep_loop.run(n_games)

File "/home/jason/github_projects/ProtoRL/protorl/loops/single.py", line 34, in run
    self.agent.update()

File "/home/jason/github_projects/ProtoRL/protorl/agents/dqn.py", line 46, in update
    q_pred = self.q_eval.forward(states)[indices, actions]
TypeError: tuple indices must be integers or slices, not tuple
AlejoCarpentier007 commented 1 week ago

The same thing happens to me, the truth is I can't find why, I'll have to thoroughly check where the problem is.

AlejoCarpentier007 commented 1 week ago

@jt70 After a search and tinkering with the framework I found the solution, what happens is that to use dueling you have to change the learner, the actor and the agent, in the folders there are the dqn.py and dueling files, you have to change that, It took me a while to realize the problem because the first thing I did was change the learner but I didn't realize that I also had to change the actor and the agent, if you don't change these it will give you a problem in the learner's update function, also if you use dueling You have to set the dueling parameter to true in the dqn.py file in the examples because otherwise it will also give an error.

from protorl.agents.dueling import DuelingDQNAgent as Agent from protorl.actor.dueling import DuelingDQNActor as Actor from protorl.learner.dueling import DuelingDQNLearner as Learner from protorl.loops.single import EpisodeLoop from protorl.policies.epsilon_greedy import EpsilonGreedyPolicy from protorl.utils.network_utils import make_dqn_networks from protorl.wrappers.common import make_env from protorl.memory.generic import initialize_memory

def main(): env_name = 'CartPole-v1'

env_name = 'PongNoFrameskip-v4'

use_prioritization = True
use_double = True
use_dueling = True
use_atari = False
layers=[32]
env = make_env(env_name, use_atari=use_atari)
n_games = 1500
bs = 64
# 0.3, 0.5 works okay for cartpole
# 0.25, 0.25 doesn't seem to work
# 0.25, 0.75 doesn't work
memory = initialize_memory(max_size=100_000,
                           obs_shape=env.observation_space.shape,
                           batch_size=bs,
                           n_actions=env.action_space.n,
                           action_space='discrete',
                           prioritized=use_prioritization,
                           alpha=0.3,
                           beta=0.5
                           )

policy = EpsilonGreedyPolicy(n_actions=env.action_space.n, eps_dec=1e-4)

q_eval, q_target = make_dqn_networks(env,hidden_layers=layers, use_double=use_double,
                                     use_dueling=use_dueling,
                                     use_atari=use_atari)
dqn_actor = Actor(q_eval, q_target, policy)
q_eval, q_target = make_dqn_networks(env, hidden_layers=layers,use_double=use_double,
                                     use_dueling=use_dueling,
                                     use_atari=use_atari)
dqn_learner = Learner(q_eval, q_target,use_double=use_double,
                      prioritized=use_prioritization, lr=1e-4)

agent = Agent(dqn_actor, dqn_learner, prioritized=use_prioritization,)
sample_mode = 'prioritized' if use_prioritization else 'uniform'
ep_loop = EpisodeLoop(agent, env, memory, sample_mode=sample_mode,
                      prioritized=use_prioritization)
scores, steps_array = ep_loop.run(n_games)

if name == 'main': main()``