sisl / BayesNets.jl

Bayesian Networks for Julia
Other
218 stars 48 forks source link

Compilation error: infer_number_of_instantiations assumes values in 1:N, value 0 found! #140

Open SEICS opened 2 years ago

SEICS commented 2 years ago

Hi,

I am trying to learn a discrete Bayesian network (BN) from a dataset. During the structural learning, I encountered the compilation error "infer_number_of_instantiations assumes values in 1:N, value 0 found!" and I am not sure why it happened.

Code producing this error:

parameters = GreedyHillClimbing(ScoreComponentCache(df), max_n_parents=1, prior=UniformPrior())
bn = fit(DiscreteBayesNet, df, parameters)

My dataset (df) looks like this:

Screenshot 2022-07-07 at 14 33 04

and by running the following: eltype.(eachcol(df)) the corresponding output is:

12-element Vector{DataType}:
 Int64
 Int64
 Int64
 Int64
 Int64
 Int64
 Int64
 Int64
 Int64
 Int64
 Int64
 Int64

The entire error output:

infer_number_of_instantiations assumes values in 1:N, value 0 found!

Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] infer_number_of_instantiations(arr::Vector{Int64})
    @ BayesNets.CPDs ~/.julia/packages/BayesNets/yBu0u/src/CPDs/utils.jl:63
  [3] (::BayesNets.var"#63#66"{DataFrame})(i::Int64)
    @ BayesNets ~/.julia/packages/BayesNets/yBu0u/src/DiscreteBayesNet/greedy_hill_climbing.jl:63
  [4] map!(f::BayesNets.var"#63#66"{DataFrame}, dest::Vector{Int64}, A::UnitRange{Int64})
    @ Base ./abstractarray.jl:2860
  [5] fit(::Type{DiscreteBayesNet}, data::DataFrame, params::GreedyHillClimbing)
    @ BayesNets ~/.julia/packages/BayesNets/yBu0u/src/DiscreteBayesNet/greedy_hill_climbing.jl:66
  [6] top-level scope
    @ ~/Desktop/bayes-aqp/Julia/bayes-aqp.ipynb:1
  [7] eval
    @ ./boot.jl:373 [inlined]
  [8] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base ./loading.jl:1196
  [9] #invokelatest#2
    @ ./essentials.jl:716 [inlined]
 [10] invokelatest
    @ ./essentials.jl:714 [inlined]
 [11] (::VSCodeServer.var"#150#151"{VSCodeServer.NotebookRunCellArguments, String})()
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.5.11/scripts/packages/VSCodeServer/src/serve_notebook.jl:18
 [12] withpath(f::VSCodeServer.var"#150#151"{VSCodeServer.NotebookRunCellArguments, String}, path::String)
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.5.11/scripts/packages/VSCodeServer/src/repl.jl:185
 [13] notebook_runcell_request(conn::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint, Base.PipeEndpoint}, params::VSCodeServer.NotebookRunCellArguments)
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.5.11/scripts/packages/VSCodeServer/src/serve_notebook.jl:14
 [14] dispatch_msg(x::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint, Base.PipeEndpoint}, dispatcher::VSCodeServer.JSONRPC.MsgDispatcher, msg::Dict{String, Any})
    @ VSCodeServer.JSONRPC ~/.vscode/extensions/julialang.language-julia-1.5.11/scripts/packages/JSONRPC/src/typed.jl:67
 [15] serve_notebook(pipename::String; crashreporting_pipename::String)
    @ VSCodeServer ~/.vscode/extensions/julialang.language-julia-1.5.11/scripts/packages/VSCodeServer/src/serve_notebook.jl:94
 [16] top-level scope
    @ ~/.vscode/extensions/julialang.language-julia-1.5.11/scripts/notebook/notebook.jl:12
 [17] include(mod::Module, _path::String)
    @ Base ./Base.jl:418
 [18] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:292
 [19] _start()
    @ Base ./client.jl:495

Can anyone help me with this? Thank you so much!

SEICS commented 2 years ago

Hi,

I just tested my code and it seems that this caused by the 0s in my dataset. I wonder why 0s are not considered as acceptable data values (just by curious)?

tawheeler commented 2 years ago

Hello SEICS,

I took a look, and infer_number_of_instantiations has the following docstring:

"""
    infer_number_of_instantiations{I<:Int}(arr::AbstractVector{I})
Infer the number of instantiations, N, for a data type, assuming that it takes on the values 1:N
"""

As such, it assumes values between 1 and N for some N. Values of 0 would be out of bounds.

This assumption basically allows us to use Julia 1-based indices to index into count tables. The easiest way to convert a dataset to 1:N form is to use the categorical discretizer in Discretizers.jl.

The documentation right now does not emphasize this assumption particularly well. We do have the following for categorical CPDs: image and our discrete Bayesian networks are comprised of them.

I hope that helps!

SEICS commented 2 years ago

Ah! Thank you for the explanation! I am new to Julia also, so I don't know that Julia uses 1-based indices. Really helpful advice! I will give it a try to the Discretizers.jl.