jonathan-laurent / AlphaZero.jl

A generic, simple and fast implementation of Deepmind's AlphaZero algorithm.
https://jonathan-laurent.github.io/AlphaZero.jl/stable/
MIT License
1.24k stars 140 forks source link

Error using KNET #172

Closed smart-fr closed 1 year ago

smart-fr commented 1 year ago

In an attempt to resolve a FLUX inference issue I submitted on Issue 171, I tried to train a new NN using the KNET implementation of AlphaZero.Netlib, and got the following error during the first session of self-play.

Is there anything I should be doing to properly use Knet, else than setting the environment variable ALPHAZERO_DEFAULT_DL_FRAMEWORK to "KNET" and pre-compile again?

PS C:\Projets\BonbonRectangle\IA\dev> julia --threads=auto --project -e 'using AlphaZero; Scripts.train("bonbon-rectangle"; save_intermediate=true)'
[ Info: Using the Knet implementation of AlphaZero.NetLib.
[ Info: BonbonRectangle v20230205_KNET_16x16_32_16_no_benchmark
[ Info: params_05.jl

Initializing a new AlphaZero environment

  Initial report

    Number of network parameters: 19,320,065
    Number of regularized network parameters: 19,317,888
    Memory footprint per MCTS node: 58456 bytes

Starting iteration 1

  Starting self-play

MethodError: no method matching *(::Knet.KnetArrays.Bcasted{CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}, ::Knet.KnetArrays.Bcasted{Knet.KnetArrays.KnetArray{Float32, 4}})
Closest candidates are:
  *(::Knet.KnetArrays.Bcasted, ::Knet.KnetArrays.Bcasted) at C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\binary.jl:142
  *(::Any, ::Knet.KnetArrays.Bcasted) at C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\binary.jl:143
  *(::Knet.KnetArrays.Bcasted, ::Any) at C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\binary.jl:144
  ...
Stacktrace:
  [1] *(x::Knet.KnetArrays.Bcasted{CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}}, y::Knet.KnetArrays.Bcasted{Knet.KnetArrays.KnetArray{Float32, 4}})
    @ Knet.KnetArrays C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\binary.jl:142
  [2] broadcasted(::Base.Broadcast.Style{Knet.KnetArrays.KnetArray}, ::Function, ::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::Knet.KnetArrays.KnetArray{Float32, 4})
    @ Knet.KnetArrays C:\Users\smart\.julia\packages\Knet\YIFWC\src\knetarrays\broadcast.jl:10
  [3] broadcasted(::Function, ::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, ::Knet.KnetArrays.KnetArray{Float32, 4})
    @ Base.Broadcast .\broadcast.jl:1304
  [4] _batchnorm4_fused(g::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, b::CUDA.CuArray{Float32, 4, CUDA.Mem.DeviceBuffer}, x::Knet.KnetArrays.KnetArray{Float32, 4}; eps::Float64, training::Bool, cache::Knet.Ops20.BNCache, moments::Knet.Ops20.BNMoments, o::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Knet.Ops20 C:\Users\smart\.julia\packages\Knet\YIFWC\src\ops20\batchnorm.jl:184
  [5] #batchnorm4#180
    @ C:\Users\smart\.julia\packages\Knet\YIFWC\src\ops20\batchnorm.jl:149 [inlined]     
  [6] batchnorm(x::Knet.KnetArrays.KnetArray{Float32, 4}, moments::Knet.Ops20.BNMoments, params::AutoGrad.Param{CUDA.CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}; training::Bool, o::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Knet.Ops20 C:\Users\smart\.julia\packages\Knet\YIFWC\src\ops20\batchnorm.jl:70     
  [7] (::AlphaZero.KnetLib.BatchNorm)(x::Knet.KnetArrays.KnetArray{Float32, 4})
    @ AlphaZero.KnetLib C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\knet\layers.jl:84
  [8] (::AlphaZero.KnetLib.Chain)(x::Knet.KnetArrays.KnetArray{Float32, 4})
    @ AlphaZero.KnetLib C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\knet\layers.jl:19
  [9] forward(nn::ResNet, state::Knet.KnetArrays.KnetArray{Float32, 4})
    @ AlphaZero.KnetLib C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\knet.jl:147
 [10] forward_normalized(nn::ResNet, state::Knet.KnetArrays.KnetArray{Float32, 4}, actions_mask::Knet.KnetArrays.KnetMatrix{Float32})
    @ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:264
 [11] evaluate_batch(nn::ResNet, batch::Vector{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, Tuple{Int64, Int64}, 256}, UInt8}}})
    @ AlphaZero.Network C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\networks\network.jl:312
 [12] fill_and_evaluate(net::ResNet, batch::Vector{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, Tuple{Int64, Int64}, 256}, UInt8}}}; batch_size::Int64, fill_batches::Bool)
    @ AlphaZero C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\simulations.jl:32     
 [13] #36
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\simulations.jl:54 [inlined]     
 [14] #4
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\batchifier.jl:71 [inlined]      
 [15] log_event(f::AlphaZero.Batchifier.var"#4#7"{Vector{NamedTuple{(:board, :impact, :actions_hook, :curplayer), Tuple{StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, UInt8, 256}, StaticArraysCore.SMatrix{16, 16, Tuple{Int64, Int64}, 256}, UInt8}}}, AlphaZero.var"#36#37"{Int64, Bool, ResNet}}; name::String, cat::String, pid::Int64, tid::Int64)
    @ AlphaZero.ProfUtils C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\prof_utils.jl:40
 [16] macro expansion
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\batchifier.jl:68 [inlined]      
 [17] macro expansion
    @ C:\Projets\BonbonRectangle\IA\dev\AlphaZero.jl\src\util.jl:21 [inlined]
 [18] (::AlphaZero.Batchifier.var"#2#5"{Int64, AlphaZero.var"#36#37"{Int64, Bool, ResNet}, Channel{Any}})()
    @ AlphaZero.Batchifier C:\Users\smart\.julia\packages\ThreadPools\ANo2I\src\macros.jl:261
jonathan-laurent commented 1 year ago

Unfortunately, I haven't been using the Knet backend for a while and I wouldn't be surprised if it got broken. As I said in another issue (https://github.com/jonathan-laurent/AlphaZero.jl/issues/166#issuecomment-1431118385), keeping support for both Knet and Flux is a maintenance nightmare and probably not a responsibility AlphaZero.jl should have taken on itself. Knet support may be dropped anytime in the future unless a community-wide API compatibility solution is found so I would not rely on it if I were you.

smart-fr commented 1 year ago

OK, thank you for the advice!