CuDNN error 8 on Ubuntu 18.04, Julia 1.5.2

deveshjawla commented 4 years ago

Hi Guys, I'm getting this error on the master branch. This is right after the self play has finished. Below the error you'll find that Julia sees CUDA and device correctly. But throws a CuDNN error, can you help?

ERROR: LoadError: CUDNNError: CUDNN_STATUS_EXECUTION_FAILED (code 8)
Stacktrace:
 [1] throw_api_error(::CUDA.CUDNN.cudnnStatus_t) at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/error.jl:19
 [2] macro expansion at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/error.jl:30 [inlined]
 [3] cudnnBatchNormalizationForwardTraining(::Ptr{Nothing}, ::CUDA.CUDNN.cudnnBatchNormMode_t, ::Base.RefValue{Float32}, ::Base.RefValue{Float32}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,4}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,4}, ::CUDA.CUDNN.TensorDesc, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::Float32, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::Float32, ::CUDA.CuPtr{Nothing}, ::CUDA.CuPtr{Nothing}) at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/utils/call.jl:93
 [4] cudnnBNForward!(::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,1}, ::CUDA.CuArray{Float32,1}, ::Float32; cache::Nothing, alpha::Int64, beta::Int64, eps::Float32, training::Bool) at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/batchnorm.jl:55
 [5] #batchnorm#478 at /home/sdeveshj/.julia/packages/CUDA/dZvbp/lib/cudnn/batchnorm.jl:26 [inlined]
 [6] #adjoint#17 at /home/sdeveshj/.julia/packages/Flux/05b38/src/cuda/cudnn.jl:6 [inlined]
 [7] _pullback at /home/sdeveshj/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:53 [inlined]
 [8] BatchNorm at /home/sdeveshj/.julia/packages/Flux/05b38/src/cuda/cudnn.jl:3 [inlined] (repeats 2 times)
 [9] applychain at /home/sdeveshj/.julia/packages/Flux/05b38/src/layers/basic.jl:36 [inlined]
 [10] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}}}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [11] applychain at /home/sdeveshj/.julia/packages/Flux/05b38/src/layers/basic.jl:36 [inlined]
 [12] _pullback(::Zygote.Context, ::typeof(Flux.applychain), ::Tuple{Flux.Conv{2,2,typeof(identity),CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,1}},Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}}}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [13] Chain at /home/sdeveshj/.julia/packages/Flux/05b38/src/layers/basic.jl:38 [inlined]
[14] _pullback(::Zygote.Context, ::Flux.Chain{Tuple{Flux.Conv{2,2,typeof(identity),CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,1}},Flux.BatchNorm{typeof(NNlib.relu),CUDA.CuArray{Float32,1},CUDA.CuArray{Float32,1},Float32},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}},Flux.Chain{Tuple{Flux.SkipConnection,AlphaZero.FluxLib.var"#19#20"}}}}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [15] forward at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:161 [inlined]
 [16] _pullback(::Zygote.Context, ::typeof(AlphaZero.Network.forward), ::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [17] evaluate at /home/sdeveshj/AlphaZero.jl/src/networks/network.jl:253 [inlined]
 [18] _pullback(::Zygote.Context, ::typeof(AlphaZero.Network.evaluate), ::AlphaZero.FluxLib.ResNet{Game}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,2}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [19] losses at /home/sdeveshj/AlphaZero.jl/src/learning.jl:62 [inlined]
 [20] _pullback(::Zygote.Context, ::typeof(AlphaZero.losses), ::AlphaZero.FluxLib.ResNet{Game}, ::LearningParams, ::Float32, ::Float32, ::Tuple{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2}}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [21] L at /home/sdeveshj/AlphaZero.jl/src/learning.jl:113 [inlined]
 [22] _pullback(::Zygote.Context, ::AlphaZero.var"#L#54"{AlphaZero.Trainer}, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,4}, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,2}, ::CUDA.CuArray{Float32,2}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [23] adjoint at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/lib/lib.jl:172 [inlined]
 [24] _pullback at /home/sdeveshj/.julia/packages/ZygoteRules/6nssF/src/adjoint.jl:47 [inlined]
 [25] #1 at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:83 [inlined]
 [26] _pullback(::Zygote.Context, ::AlphaZero.FluxLib.var"#1#2"{AlphaZero.var"#L#54"{AlphaZero.Trainer},Tuple{CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,4},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2},CUDA.CuArray{Float32,2}}}) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface2.jl:0
 [27] pullback(::Function, ::Zygote.Params) at /home/sdeveshj/.julia/packages/Zygote/Xgcgs/src/compiler/interface.jl:172
 [28] lossgrads(::Function, ::Zygote.Params) at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:73
 [29] train!(::AlphaZero.var"#53#55"{Array{Float32,1}}, ::AlphaZero.FluxLib.ResNet{Game}, ::Adam, ::Function, ::Base.Iterators.Take{Base.Iterators.Stateful{Base.Iterators.Flatten{Base.Generator{Base.Iterators.Repeated{Nothing},AlphaZero.Util.var"#12#13"{AlphaZero.var"#50#52",Tuple{Array{Float32,2},Array{Float32,4},Array{Float32,2},Array{Float32,2},Array{Float32,2}},Int64,Bool}}},Tuple{Any,Tuple{Nothing,Base.Generator{_A,AlphaZero.Util.var"#9#11"{AlphaZero.var"#50#52"}} where _A,Any}}}}, ::Int64) at /home/sdeveshj/AlphaZero.jl/src/networks/flux.jl:82
 [30] batch_updates!(::AlphaZero.Trainer, ::Int64) at /home/sdeveshj/AlphaZero.jl/src/learning.jl:116
 [31] macro expansion at ./timing.jl:310 [inlined]
[32] learning_step!(::Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}, ::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/sdeveshj/AlphaZero.jl/src/training.jl:185
 [33] macro expansion at ./timing.jl:310 [inlined]
 [34] macro expansion at /home/sdeveshj/AlphaZero.jl/src/report.jl:229 [inlined]
 [35] train!(::Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}, ::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/sdeveshj/AlphaZero.jl/src/training.jl:295
 [36] resume!(::Session{Env{Game,AlphaZero.FluxLib.ResNet{Game},NamedTuple{(:board, :curplayer),Tuple{StaticArrays.SArray{Tuple{7,6},UInt8,2,42},UInt8}}}}) at /home/sdeveshj/AlphaZero.jl/src/ui/session.jl:383
 [37] top-level scope at /home/sdeveshj/AlphaZero.jl/scripts/alphazero.jl:89
 [38] include(::Function, ::Module, ::String) at ./Base.jl:380
 [39] include(::Module, ::String) at ./Base.jl:368
 [40] exec_options(::Base.JLOptions) at ./client.jl:296
 [41] _start() at ./client.jl:506
in expression starting at `/home/sdeveshj/AlphaZero.jl/scripts/alphazero.jl:80

julia> Libdl.dlpath("libcuda")
"/usr/lib/x86_64-linux-gnu/libcuda.so"

julia> Libdl.dlpath("libcudnn")
"/usr/lib/cuda/lib64/libcudnn.so"

(@v1.5) pkg> activate .
 Activating environment at `~/AlphaZero.jl/Project.toml`

julia> CUDA.device()
CuDevice(0): GeForce RTX 2070 SUPER

julia> has_cuda()
true

julia> CUDA.version()
v"11.1.0"

jonathan-laurent commented 4 years ago

This looks like the same issue that was also reported here: https://github.com/JuliaGPU/CUDA.jl/issues/447. It looks like a CUDA.jl issue but I'll dig deeper into this after I release v0.4.

Also, I am testing MDP support right now so it should be available soon. :-)

deveshjawla commented 4 years ago

I believe it is the CUDA.jl v1.3.3 not playing well with CUDA 11.1

I will downgrade the NVIDIA drivers, CUDA and CuDNN to see if that works.

Because if I use

JULIA_CUDA_VERSION=11.1 julia --project --color=yes scripts/alphazero.jl --game connect-four train
ERROR: LoadError: InitError: CUDA.jl does not yet support CUDA with nvdisasm 11.1.74; please file an issue.

then it's the same as #23

deveshjawla commented 4 years ago

Also, I am testing MDP support right now so it should be available soon. :-)

Thanks for this implementation man)), and yes MDP support would be very nice too. With time I hope to also contribute to this implementation.

deveshjawla commented 4 years ago

So I tried to reduce the number of self play games, the batch size for learning and the number of workers in the params.jl of connect4. This seems to work on both CUDA 11.1 and CUDA 11.0. Now I am dealing with a variety of errors(see attached pics). Points to note:

Downgrading to CUDA11.0 on the master skips the ERROR code 8, but throws ERROR code 2 on all 128 workers.
Reducing the complexity as mentioned above, completes 1 iteration successfully.

Screenshot from 2020-10-21 17-28-21

deveshjawla commented 4 years ago

And TicTACTOE trains without any errors. This is enough motivation to basically play around with parameters of connect4. Getting to know the implementation better should help.

jonathan-laurent commented 4 years ago

Thanks for your help investigating this bug! It is very possible that the errors we have been seeing are legitimate "out-of-resources" error under disguise. That being said, the connect-four example runs on my machine, which has 16GB of RAM and a 8GB GTX 2070 GPU.

And TicTACTOE trains without any errors.

This is unsurprising as the tictactoe example is configured to run on CPU.

deveshjawla commented 4 years ago

This is unsurprising as the tictactoe example is configured to run on CPU.

Yes And I tried connect4 with cpu-only, and it works 100% fine. Then I tested to put use_gpu=true for all except LeraningParams and it works fine as well, so the problem comes down to the LearningParams part. Hopefully soon Flux.jl is updated and it's CUDA.jl dependency is elevated to the latest version. Otherwise, maybe you could hint as to where to look, but I am enjoying exploring the implementation by myself also ;)

deveshjawla commented 4 years ago

That being said, the connect-four example runs on my machine, which has 16GB of RAM and a 8GB GTX 2070 GPU.

Actually, I have the same config, RTX 2070 Super, and 125 GB RAM at the workplace, so I believe it is not the resources but really the compatibility of versions.

deveshjawla commented 4 years ago

It is very possible that the errors we have been seeing are legitimate "out-of-resources" error under disguise.

It appears that you were correct. I reduced the training complexity whilst keeping use_gpu=true for all, and the training completed successfully.

But this is very strange, I can't understand why when the resources are the same between our machine, why it fails on mine.

deveshjawla commented 4 years ago

Ok, I was using more filters than mentioned in the documentation. In the latest master branch you have increased the size of the ResNet, Setting num_filters=64 now runs without any problem. I guess we can close this now.

jonathan-laurent / AlphaZero.jl

CuDNN error 8 on Ubuntu 18.04, Julia 1.5.2 #24