openai / baselines

OpenAI Baselines: high-quality implementations of reinforcement learning algorithms
MIT License
15.84k stars 4.88k forks source link

Non-deterministic behaviour when ran on GPU #805

Open dkorenkevych opened 5 years ago

dkorenkevych commented 5 years ago

The following commit https://github.com/openai/baselines/commit/9fa8e1baf1d1f975b87b369a8082122eac812eb1#diff-fc3e1c3522d2c7871bda86ed40bcb0ddL28 introduced non-deterministic behavior of PPO1 when ran on GPU even with setting tf.set_random_seed (CPU behavior is deterministic). Specifically, at line 28 and others in mlp_policy.py replacing

U.dense(last_out, hid_size, name='fc%i'%(i+1), weight_init=U.normc_initializer(1.0))

with

tf.layers.dense(last_out, hid_size, name='fc%i'%(i+1), kernel_initializer=U.normc_initializer(1.0))

created this behavior. Below are 4 runs of Mujoco Swimmer-v2 environment with the same random seed using PPO1 in latest version of baselines code swimmer_same_seed_new_code.pdf

Replacing all instances of tf.layers.dense with U.dense, and adding the corresponding code

def dense(x, size, name, weight_init=None, bias=True):
   w = tf.get_variable(name + "/w", [x.get_shape()[1], size], initializer=weight_init)
   ret = tf.matmul(x, w)
   if bias:
       b = tf.get_variable(name + "/b", [size], initializer=tf.zeros_initializer())
       return ret + b
   else:
       return ret

back to tf_utils.py fixes the issue. Below is a figure with 4 Swimmer runs after this change swimmer_same_seed_old_code.pdf All experiments were run using tensorflow-gpu==1.12.0 cudatoolkit==9.2
cudnn==7.3.1

pzhokhov commented 5 years ago

interesting ... I'd expect that tf.layers.dense is internally also performs the same matrix multiplication; so for determinism it should not matter which one to use. I'll keep this open, but this sounds more like a tensorflow issue to me.

brett-daley commented 5 years ago

GPU calculations are non-deterministic because the thread scheduling is non-deterministic. Floating-point errors are accumulated in unpredictable ways for operations that are not associative -- a consequence of the GPU hardware itself, not TensorFlow.

This same phenomenon would occur on a multi-core CPU too, but I believe TensorFlow typically does not parallelize operations that lose determinism when using a CPU because the performance loss is minimal. This is why your CPU output is deterministic.

You can read these links for more info:

dkorenkevych commented 5 years ago

Thanks for the feedback, the GPU calculations however are deterministic with an old baselines code, as shown on my second figure. If this would be a hardware issue, the old code should have produced non-deterministic behavior as well. Assuming the GPU computation is inherently non-deterministic from the background you provided, my suspicion then would be that those numerical errors are not sufficient to cause noticeable difference in performance in this case, and the new baselines code introduced some other issue that causes such big difference in behavior

brett-daley commented 5 years ago

I see your point. After doing some more reading, it seems that reduction algorithms (like tf.reduce_sum, see this post and this issue) are a common source of non-determinism on GPUs due to the scheduling effect I described above. However, linear algebra operations like matrix multiplication, vector sums, etc. should still be deterministic because each element is computed by a single thread.

In this case, @pzhokhov is probably right that it's a TensorFlow issue. It would be interesting to figure out what tf.layers.dense does that makes it non-deterministic, and maybe look into alternatives for ppo1.

Let me try it on my machine and see if it's non-deterministic for me too. Maybe it was a temporary problem that was fixed after tensorflow-gpu==1.12.0. I'll post here again later.

nikonikolov commented 5 years ago

It is definitely not a hardware thing. PyTorch can produce completely deterministic behavior both on CPU and GPU. TF has been annoyingly non-deterministic on GPU for ages. I also still experience non-deterministic behavior, despite the developers claim that operations are now deterministic on GPU https://github.com/tensorflow/tensorflow/issues/2732.

brett-daley commented 5 years ago

@dkorenkevych Sorry for the delay; my MuJoCo license expired and it took me a while to renew it.

I ran some sample experiments on the most-recent commit (3f2f45a) which has the calls for tf.layers.dense. I'm actually using an older version of tensorflow-gpu==1.11.0. I can confirm that I get non-deterministic behavior on my GPU but deterministic behavior on my CPU.

For reference, I'm running this command: python -m baselines.ppo1.run_mujoco --env=Swimmer-v2 --seed=0 --num_timesteps=10000

dkorenkevych commented 5 years ago

@brett-daley Thanks for looking into this! For now I am using a forked repo with a fix that I described in my first post, but, for experiments reproducibility purposes, it would be convenient to have a deterministic behavior on the original master branch

brett-daley commented 5 years ago

I agree. I was curious to see what PPO2 uses because it was deterministic the last time I checked. Looks like it uses a fully-connected layer defined in A2C that is similar to your code:

https://github.com/openai/baselines/blob/d51f8be8f9fff6d1613559e55357b0076ea88362/baselines/a2c/utils.py#L58-L63

Perhaps replacing tf.layers.dense with this function would be the best solution, since it uses pre-existing code in the repo?

dkorenkevych commented 5 years ago

I confirm that using fc layers defined in a2/utils.py results in deterministic behavior on GPU and a comparable learning performance (result on 5 different random seeds below). swimmer_ppo1_fc_layers.pdf

Replacing tf.layers.dense with those in PPO1 would solve the problem.

brett-daley commented 5 years ago

I think you should open a pull request with those changes (I can do it if you want). The owners can merge it if they approve it.

brett-daley commented 5 years ago

Actually, it looks like HER and DDPG also use tf.layers.dense. I wonder if they need to be updated too.

$ grep -R tf.layers.dense
baselines/ddpg/models.py:            x = tf.layers.dense(x, self.nb_actions, kernel_initializer=tf.random_uniform_initializer(minval=-3e-3, maxval=3e-3))
baselines/ddpg/models.py:            x = tf.layers.dense(x, 1, kernel_initializer=tf.random_uniform_initializer(minval=-3e-3, maxval=3e-3), name='output')
baselines/her/util.py:        input = tf.layers.dense(inputs=input,
baselines/ppo1/cnn_policy.py:            x = tf.nn.relu(tf.layers.dense(x, 256, name='lin', kernel_initializer=U.normc_initializer(1.0)))
baselines/ppo1/cnn_policy.py:            x = tf.nn.relu(tf.layers.dense(x, 512, name='lin', kernel_initializer=U.normc_initializer(1.0)))
baselines/ppo1/cnn_policy.py:        logits = tf.layers.dense(x, pdtype.param_shape()[0], name='logits', kernel_initializer=U.normc_initializer(0.01))
baselines/ppo1/cnn_policy.py:        self.vpred = tf.layers.dense(x, 1, name='value', kernel_initializer=U.normc_initializer(1.0))[:,0]
baselines/ppo1/mlp_policy.py:                last_out = tf.nn.tanh(tf.layers.dense(last_out, hid_size, name="fc%i"%(i+1), kernel_initializer=U.normc_initializer(1.0)))
baselines/ppo1/mlp_policy.py:            self.vpred = tf.layers.dense(last_out, 1, name='final', kernel_initializer=U.normc_initializer(1.0))[:,0]
baselines/ppo1/mlp_policy.py:                last_out = tf.nn.tanh(tf.layers.dense(last_out, hid_size, name='fc%i'%(i+1), kernel_initializer=U.normc_initializer(1.0)))
baselines/ppo1/mlp_policy.py:                mean = tf.layers.dense(last_out, pdtype.param_shape()[0]//2, name='final', kernel_initializer=U.normc_initializer(0.01))
baselines/ppo1/mlp_policy.py:                pdparam = tf.layers.dense(last_out, pdtype.param_shape()[0], name='final', kernel_initializer=U.normc_initializer(0.01))