philtabor / ProtoRL

A Torch Based RL Framework for Rapid Prototyping of Research Papers
MIT License
42 stars 2 forks source link

inconvenience during training #5

Open AlejoCarpentier007 opened 1 week ago

AlejoCarpentier007 commented 1 week ago

I was doing training using dueling networks and after episode 582 an error occurred and the training continued as if nothing had happened but I lost all the progress and the agent as if the weights had been restored

episode 582 ep score 141.0 average score 181.4 n steps 111244 /home/edo/projects/protorl/protorl/examples/protorl/memory/sum_tree.py:104: RuntimeWarning: divide by zero encountered in double_scalars weights = np.array([(1 / self.counter * 1 / prob)*self.beta /home/edo/projects/protorl/protorl/examples/protorl/memory/sum_tree.py:106: RuntimeWarning: invalid value encountered in multiply weights = 1 / max(weights) episode 583 ep score 161.0 average score 181.0 n steps 111405

The training was going well, it reached +350 average reward in episode 292 but then it started to drop in performance, something that is normal but then that happened there was a division by zero there and then another error in a multiplication

source code

from protorl.agents.dueling import DuelingDQNAgent as Agent from protorl.actor.dueling import DuelingDQNActor as Actor from protorl.learner.dueling import DuelingDQNLearner as Learner from protorl.loops.single import EpisodeLoop from protorl.policies.epsilon_greedy import EpsilonGreedyPolicy from protorl.utils.network_utils import make_dqn_networks from protorl.wrappers.common import make_env from protorl.memory.generic import initialize_memory

def main(): env_name = 'CartPole-v1'

env_name = 'PongNoFrameskip-v4'

use_prioritization = True
use_double = True
use_dueling = True
use_atari = False
layers=[32]
env = make_env(env_name, use_atari=use_atari)
n_games = 1500
bs = 64
# 0.3, 0.5 works okay for cartpole
# 0.25, 0.25 doesn't seem to work
# 0.25, 0.75 doesn't work
memory = initialize_memory(max_size=100_000,
                           obs_shape=env.observation_space.shape,
                           batch_size=bs,
                           n_actions=env.action_space.n,
                           action_space='discrete',
                           prioritized=use_prioritization,
                           alpha=0.3,
                           beta=0.5
                           )

policy = EpsilonGreedyPolicy(n_actions=env.action_space.n, eps_dec=1e-4)

q_eval, q_target = make_dqn_networks(env,hidden_layers=layers, use_double=use_double,
                                     use_dueling=use_dueling,
                                     use_atari=use_atari)
dqn_actor = Actor(q_eval, q_target, policy)
q_eval, q_target = make_dqn_networks(env, hidden_layers=layers,use_double=use_double,
                                     use_dueling=use_dueling,
                                     use_atari=use_atari)
dqn_learner = Learner(q_eval, q_target,use_double=use_double,
                      prioritized=use_prioritization, lr=1e-4)

agent = Agent(dqn_actor, dqn_learner, prioritized=use_prioritization,)
sample_mode = 'prioritized' if use_prioritization else 'uniform'
ep_loop = EpisodeLoop(agent, env, memory, sample_mode=sample_mode,
                      prioritized=use_prioritization)
scores, steps_array = ep_loop.run(n_games)

if name == 'main': main()

AlejoCarpentier007 commented 6 days ago

I have trained two dueling agents, one without double and the other with double, both without prioritized replay and in both the training was successful, I am almost sure that the problem I experienced before was due to using experienzed replay, at least this is using dueling. The problem is most likely in the sum tree, although the training initially works perfectly, it is hyper efficient much more than when it is not used, there comes a point where performance declines and never recovers, I don't know what could be being the cause, I played with changing alpha and beta and the behavior of epsilon greedy and all without success. I will try to make Beta grow gradually, like epsilon does, which decreases with steps, and make Beta take the value of 1 as the episodes pass. It may be something else, the truth is I don't know very well how Priorized replay works. dueling without double took 3361 episodes to achieve 500 average reward dueling with double did it faster, at 2874. all of this was with only one agent at a time.

AlejoCarpentier007 commented 5 days ago

WhatsApp Image 2024-06-29 at 13 54 59 Hi, I saw you made some recent updates. I was testing with those new changes and the error keeps appearing, it is in the calculate_weight function of sum_tree, the value of prob at some point during training is zero. To solve this I put an if that if it is zero probs directly places a zero in weight and thus I avoid the division by zero, with that it does not happen to me that the performance plummets and the training improves, I wanted to know if it happens to you the same as after a few episodes a division between zero appears

source code

def _calculate_weights(self, probs: List): if self.counter == 0:

avoid division by zero

        print("Counter is zero, returning ones")
        return np.ones(len(probs))

    weights = []
    for prob in probs:
        if prob > 0:
                weight = (1 / self.counter * 1 / prob) ** self.beta
        else:
                print(f"Prob is zero for prob value: {prob}")
                weight = 0
        weights.append(weight)

    weights = np.array(weights)

    max_weight = max(weights)
    if max_weight > 0:
        weights *= 1 / max_weight
    else:
        print("Max weight is zero or less")

    return weights