Cudnn Error when running retro-baselines jobs (Nvidia-cuda docker error)

floodsung commented 6 years ago

I got such error when running a retro-baselines ppo2 job (directly using ppo2.docker): ted and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead WARNING:tensorflow:From /root/venv/lib/python3.5/site-packages/baselines/common/distributions.py:147: calling reduce_sum (from tensorflow.python.ops.math_ops) with keep_dims is deprecated and will be removed in a future version. Instructions for updating: keep_dims is deprecated, use keepdims instead 2018-04-09 04:27:06.399347: E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7102 (compatibility version 7100) but source was compiled with 7005 (compatibility version 7000). If using a binary install, upgrade your CuDNN library to match. If building from sources, make sure the library loaded at runtime matches a compatible version specified during compile configuration. 2018-04-09 04:27:06.399956: F tensorflow/core/kernels/conv_ops.cc:712] Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo<T>(), &algorithms)

It seems that the test server installed an unsuitable cudnn version (Tensorflow requires cudnn 7.0).

floodsung commented 6 years ago

I just found that nvidia-docker updated the cudnn version to 7.1 a week ago

endrift commented 6 years ago

I put the wrong issue number in the commit, but this should be fixed. Let me know if you encounter more errors.

floodsung commented 6 years ago

Still the same error today!

endrift commented 6 years ago

You'll need to pull openai/retro-agent again and then try rebuilding.

floodsung commented 6 years ago

thanks, it is ok now!

openai / retro-contest

Cudnn Error when running retro-baselines jobs (Nvidia-cuda docker error) #7