theogf / AugmentedGaussianProcesses.jl

Gaussian Process package based on data augmentation, sparsity and natural gradients
https://theogf.github.io/AugmentedGaussianProcesses.jl/dev/
Other
135 stars 9 forks source link

ERROR: LoadError: MethodError: no method matching AnalyticSVI() #81

Open martinjankowiak opened 3 years ago

martinjankowiak commented 3 years ago

i am able to train using

 model = SVGP(X_train, Y_train, kernel, LogisticLikelihood(), AnalyticVI(), num_inducing)

but get an error ERROR: LoadError: MethodError: no method matching AnalyticSVI()

if i instead do

 model = SVGP(X_train, Y_train, kernel, LogisticLikelihood(), AnalyticSVI(), num_inducing)

is this expected behavior?

theogf commented 3 years ago

AnalyticSVI expects to get the number of samples per minibatch as a first argument. For example AnalyticSVI(10)

martinjankowiak commented 3 years ago

great, thanks. that does the trick.

the only issue is that with default settings i seem to get lots of cholesky errors. if i change the optimiser arg to AnalyticSVI on the other hand (to e.g. ADAM(0.001)) i instead get a broadcasting error:

ERROR: LoadError: DimensionMismatch("array could not be broadcast to match destination")
Stacktrace:
 [1] check_broadcast_shape at ./broadcast.jl:520 [inlined]
 [2] check_broadcast_axes at ./broadcast.jl:523 [inlined]
 [3] check_broadcast_axes at ./broadcast.jl:527 [inlined]
 [4] instantiate at ./broadcast.jl:269 [inlined]
 [5] materialize! at ./broadcast.jl:848 [inlined]
 [6] materialize!(::Array{Float64,1}, ::Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(+),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(*),Tuple{Float64,Array{Float64,1}}},Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{1},Nothing,typeof(*),Tuple{Base.Broadcast.Broadcasted{Base.Broadcast.DefaultArrayStyle{0},Nothing,typeof(-),Tuple{Int64,Float64}},Array{Float64,1}}}}}) at ./broadcast.jl:845
 [7] apply!(::ADAM, ::Array{Float64,1}, ::Array{Float64,1}) at /home/mjankowi/.julia/packages/Flux/sY3yx/src/optimise/optimisers.jl:175
 [8] global_update!(::AugmentedGaussianProcesses.SparseVarLatent{Float64,AugmentedGaussianProcesses.GPPrior{Float64,TransformedKernel{Matern52Kernel,ChainTransform{Array{Transform,1}}},ZeroMean{Float64}},AugmentedGaussianProcesses.VarPosterior{Float64},OptimIP{SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true},ColVecs{Float64,Array{Float64,2},SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}},KmeansIP{SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true},ColVecs{Float64,Array{Float64,2},SubArray{Float64,1,Array{Float64,2},Tuple{Base.Slice{Base.OneTo{Int64}},Int64},true}}},Nothing},ADAM}, ::AugmentedGaussianProcesses.AVIOptimizer{Float64,ADAM}, ::AnalyticVI{Float64,1,SubArray{SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true},1,RowVecs{Float64,Array{Float64,2},SubArray{Float64,1,Array{Float64,2},Tuple{Int64,Base.Slice{Base.OneTo{Int64}}},true}},Tuple{Array{Int64,1}},false},SubArray{Float64,1,Array{Float64,1},Tuple{Array{Int64,1}},false}}) at /home/mjankowi/.julia/packages/AugmentedGaussianProcesses/0DoYF/src/inference/analyticVI.jl:265

do you have any suggestions for reasonable settings if i want to do SVGP with LogisticLikelihood and AnalyticSVI ?

theogf commented 3 years ago

Oh yeah that's an ugly bug. I don't know how it went through the test for the whole time. I will make a quick fix

martinjankowiak commented 3 years ago

thanks i can confirm that #83 fixes the broadcasting error.

however i'm still seeing lots of cholesky errors even with only 32 inducing points. can you please suggest optimiser settings that are expected to be more numerically stable?

svi = AnalyticSVI(128, optimiser=ADAM(0.0001))
model = SVGP(X_train, Y_train, kernel, LogisticLikelihood(), svi, 32, verbose=3)
theogf commented 3 years ago

Cholesky errors may suggest that the lengthscale used for your kernel is too small compared to the typical distance of your training data.

martinjankowiak commented 3 years ago

thanks for the suggestion but i'm afraid that's not the issue. the cholesky error appears after 1 or 2 iterations. when i train other GP methods like vanilla SVGP on the same dataset with lengthscales initialized in the same way, i have no stability issues whatsoever.

how do i invoke the optimization settings used in e.g. "Multi-Class Gaussian Process Classification Made Conjugate: Efficient Inference via Data Augmentation"? presumably the experiments in this paper were done using some version of the code in this repository?