spiglerg / DQN_DDQN_Dueling_and_DDPG_Tensorflow

Tensorflow + OpenAI Gym implementation of Deep Q-Network (DQN), Double DQN (DDQN), Dueling Network and Deep Deterministic Policy Gradient (DDPG)
73 stars 32 forks source link

Will removing batch normalization significantly hurt performance? #2

Open cardwing opened 7 years ago

cardwing commented 7 years ago

Hi, spiglerg! Thank you for replying to me so soon. However, I wonder whether removing batch normalization will significantly hurt performance for I want to test the code in "Reacher" task. I have seen that your code works well in the "Reacher" task as it is the sole algorithm solving the task in the openai gym currently. I wonder whether change the env_name to "Reacher-v1" can achieve comparable performance (solve the " Reacher" task after 25149 episodes) for I want to implement prioritized experience replay mechanism in the "Reacher" task. Plus, the variate 'unoform' can be removed in network.py. Thanks a lot! Regards, Cardwing

spiglerg commented 7 years ago

I didn't remember it was the only algorithm to solve it. :P I am pretty sure that TRPO would do better anyway.

In any case, I did not use batch normalization in any of the submitted solutions, so it should still solve it. You might have to play with the parameters a bit though.

For batch normalization you can try to use the following but I made some untested modifications so make sure it works first:

def batch_norm(x, beta, gamma, is_training, is_convnet=False):
    """
    Batch normalization utility.
    Ref.: http://stackoverflow.com/questions/33949786/how-could-i-use-batch-normalization-in-tensorflow
    Args:
        x:         Tensor, 2D [BD] or 4D [BHWD] input maps
        is_training: boolean tf.Varialbe, true indicates training phase
    Return:
        normed:   batch-normalized maps
    """

    with tf.variable_scope('batch_norm'):
        moments_dimensions = [0]
        if is_convnet:
            moments_dimensions = [0,1,2]

        with tf.device('/cpu:0'):
            batch_mean, batch_var = tf.nn.moments(x, moments_dimensions, name='moments')
        ema = tf.train.ExponentialMovingAverage(decay=0.5)

        def mean_var_with_update():
            ema_apply_op = ema.apply([batch_mean, batch_var])
            with tf.control_dependencies([ema_apply_op]):
                return tf.identity(batch_mean), tf.identity(batch_var)

        mean, var = tf.cond(is_training,
                            mean_var_with_update,
                            lambda: (ema.average(batch_mean), ema.average(batch_var)))

        normed = tf.nn.batch_normalization(x, mean, var, beta, gamma, 1e-3)

    return normed
cardwing commented 7 years ago

I see that the reward is ranging around -11 in the "Reacher" task after 20000 frames. Is it the code that achieves good performance in the "Reacher" task?

spiglerg commented 7 years ago

Hmm my MuJoCo license has expired so I can't test it now apparently. xD Did you try changing the discount factor or the other parameters? Also note that the 25.000 is the number of episodes, not frames. Looking at my Gym submission, I can see that it took 500-800k frames to converge. :)

cardwing commented 7 years ago

Thank you for your help. I will try again.

spiglerg commented 7 years ago

Awesome. :)