tpapp / DynamicHMCExamples.jl

Examples for Bayesian inference using DynamicHMC.jl and related packages.
Other
37 stars 3 forks source link

Request for help #12

Open goedman opened 5 years ago

goedman commented 5 years ago

Tamas, I was wondering if you would mind to take a look at m10.04d.jl?

Until chapter 10 in StatisticalRethinking.jl I've had no trouble adding the DynamicHMC versions in addition to Turing and CmdStan versions, e.g. m10.04t.jl and m10.04s.jl for. model 10.4.

I followed the approach I'd used successfully for m10.02d1.jl, which works fine, but in this case I get

julia> include("/Users/rob/.julia/dev/StatisticalRethinking/scripts/10/m10.04d.jl")

MCMC, adapting ϵ (75 steps)
0.011 s/step ...done
MCMC, adapting ϵ (25 steps)
0.0088 s/step ...done
MCMC, adapting ϵ (50 steps)
ERROR: LoadError: ArgumentError: isfinite(value) && all(isfinite, gradient) || value == -Inf must hold.
Stacktrace:
 [1] LogDensityProblems.ValueGradient{Float64,Array{Float64,1}}(::Float64, ::Array{Float64,1}) at /Users/rob/.julia/packages/ArgCheck/BUMkA/src/checks.jl:165
 [2] Type at /Users/rob/.julia/packages/LogDensityProblems/F34aW/src/LogDensityProblems.jl:73 [inlined]
 [3] logdensity(::Type{LogDensityProblems.ValueGradient}, ::LogDensityProblems.ForwardDiffLogDensity{TransformedLogDensity{TransformVariables.TransformNamedTuple{(:β, :α),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1},TransformVariables.ArrayTransform{TransformVariables.Identity,1}}},m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}},ForwardDiff.GradientConfig{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##3#4")){TransformedLogDensity{TransformVariables.TransformNamedTuple{(:β, :α),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1},TransformVariables.ArrayTransform{TransformVariables.Identity,1}}},m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}}},Float64},Float64,9,Array{ForwardDiff.Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##3#4")){TransformedLogDensity{TransformVariables.TransformNamedTuple{(:β, :α),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1},TransformVariables.ArrayTransform{TransformVariables.Identity,1}}},m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}}},Float64},Float64,9},1}}}, ::Array{Float64,1}) at /Users/rob/.julia/packages/LogDensityProblems/F34aW/src/AD_ForwardDiff.jl:45

<snipped>

Different starting values in θ = (β = [1.0, 0.0], α = [-1.0, 10.0, -1.0, -1.0, -1.0, 0.0, 2.0]) do not solve the issue, thus I wonder if this way of modeling such problems is correct.

Thanks for you help (if doable!), Rob

tpapp commented 5 years ago

Will look into it today.

tpapp commented 5 years ago

I think it is an issue with AD producing NaNs or similar. Ideally one would investigate, but I added a quick fix in the form of a wrapper, I made a PR in the repo.

Please review and let me know if this works.

tpapp commented 5 years ago

Also, you can investigate corner cases of log densities with LogDensityProblems.stresstest.

goedman commented 5 years ago

As I mentioned in the SR pull request comment, will try several more models and compare the results. I wonder if rejecting the cases where AD produces NaNs might influence the resulting densities. Maybe stress testing will help out here.

If that's ok, I'll try some more models before closing this issue.

tpapp commented 5 years ago

That's fine, keep it open as long as you like and feel free to ask if there is anything I can help with.

goedman commented 5 years ago

Tamas, in trying to dig a bit deeper into the posterior draws from m10.4d.jl, I have tried to apply stresstest:

LogDensityProblems.stresstest(P, N=1000, scale=1.0)
0-element Array{Array{Float64,1},1}

P in this case uses:

P = TransformedLogDensity(problem_transformation(p), p)
#∇P = LogDensityRejectErrors(ADgradient(:ForwardDiff, P));
∇P = ADgradient(:ForwardDiff, P);

Interestingly, interchanging the ∇P options in m12.6d.jl is spot on (and unaffected), which I interpret as some support for my earlier hypothesis that the rejection errors might create a bias.

Am I correct to assume that stresstest, if applied correctly, would show parameter values where NaNs occur? I was hoping to be able to plot those regions as part of the debugging proces.

tpapp commented 5 years ago

You should try stresstest with the ∇P (plain vanilla ADgradient, no LogDensityRejectErrors). It should indeed return the vector of locations which are not OK (eg evaluate with NaNs). Plotting is a good strategy.

goedman commented 5 years ago

As you suggest as well, I'd commented out the LogDensityRejectErrors line and replaced it with the vanilla ADgradient:

P = TransformedLogDensity(problem_transformation(p), p)
#∇P = LogDensityRejectErrors(ADgradient(:ForwardDiff, P));
∇P = ADgradient(:ForwardDiff, P);

and it fails on the NUTS_init_tune_mcmc(∇P, 1000) line.

I'll experiment a bit more.

goedman commented 5 years ago

Tamas, as I noticed (and tested) your updates to DynamicHMC and LogDensityProblems, can I ask for a little push to get stresstest() to work in trying to resolve the only model that continues to give different results in DynamicHMC vs. Turing & CmdStan?

In m10.4d1.jl (results illustrated in m10.4d1) I've tried to use stresstest() and different AD methods.

If I enable do_stresstest it will show:

julia> include("/Users/rob/.julia/dev/DynamicHMCModels/scripts/10/m10.04d1.jl")
MCMC, adapting ϵ (75 steps)
ERROR: Error while loading expression starting at /Users/rob/.julia/dev/DynamicHMCModels/scripts/10/m10.04d1.jl:87
caused by [exception 1]
InvalidLogDensityException: value is Dual{ForwardDiff.Tag{getfield(LogDensityProblems, Symbol("##3#4")){TransformedLogDensity{TransformVariables.TransformNamedTuple{(:β, :α),Tuple{TransformVariables.ArrayTransform{TransformVariables.Identity,1},TransformVariables.ArrayTransform{TransformVariables.Identity,1}}},m_10_04d_model{Array{Int64,1},Array{Int64,2},Array{Int64,1}}}},Float64}}(NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN)

Using LogDensityRejectErrors provides very stable results and the plots of the chains look reasonable (I've also tried 3 and 4 chains). It is true that the pooled estimate of a[2] is a bit higher in scale. But it is surprising this is the only model where I've seen a problem.

Particularly the sd values are vey high, in Turing, using NUTS, usually an indication that the adaptation draws are included but I don't think the posterior includes those in DynamicHMC.

tpapp commented 5 years ago

If I understand correctly, there are two issues:

  1. the script errors,

  2. the results are different.

Did these happen after the recent (minor) updates to LogDensityProblems and DynamicHMC, or did you have them before?

goedman commented 5 years ago

Correct, w.r.t. issue 2), normally (for all other models) I use LogDensityRejectErrors(ADgradient(:ForwardDiff, P)) and the results are spot on.

W.r.t. issue 1), only for the m10.4d1.jl model I have experimented with stresstest by setting do_stresstest = true. The last few weeks it fails on the ADgradient() call with above script error.

tpapp commented 5 years ago

Are you using the latest LogDensityProblems and DynamicHMC?

I could not reproduce the errors above.

Also, I am not sure how I would detect the discrepancy, should I look at the plots?

goedman commented 5 years ago

In the script the results from Turing and CmdStan are included:

rethinking = "
      mean   sd  5.5% 94.5% n_eff Rhat
a[1] -0.74 0.27 -1.19 -0.31  2899    1
a[2] 10.77 5.20  4.60 20.45  1916    1
a[3] -1.05 0.28 -1.50 -0.62  3146    1
a[4] -1.05 0.28 -1.50 -0.61  3525    1
a[5] -0.73 0.28 -1.17 -0.28  3637    1
a[6]  0.22 0.27 -0.21  0.67  3496    1
a[7]  1.82 0.41  1.21  2.50  3202    1
bp    0.83 0.27  0.42  1.27  2070    1
bpC  -0.13 0.31 -0.62  0.34  3430    1
";

I tried this morning with:

[[DynamicHMC]]
deps = ["ArgCheck", "DataStructures", "DocStringExtensions", "LinearAlgebra", "LogDensityProblems", "Parameters", "Random", "Statistics", "StatsFuns", "Test", "TransformVariables"]
git-tree-sha1 = "f9c9e6e703afccb02a1179e4886c97134ff20b55"
uuid = "bbc10e6e-7c05-544b-b16e-64fede858acb"
version = "1.0.4"

...

[[LogDensityProblems]]
deps = ["ArgCheck", "BenchmarkTools", "DiffResults", "DocStringExtensions", "Parameters", "Pkg", "Random", "Requires", "Test", "TransformVariables"]
git-tree-sha1 = "1ce5ebd16b5026a35cbf4daec66b2433b83d0d23"
uuid = "6fdf6af0-433a-55f7-b3ed-c6c6e0b8df7c"
version = "0.8.0"

in Manifest.toml.

goedman commented 5 years ago

The results I'm getting:

Log evidence      = 0.0
Iterations        = 1:3000
Thinning interval = 1
Chains            = 1
Samples per chain = 3000
parameters        = bp, bpC

Empirical Posterior Estimates
───────────────────────────────────────────
parameters
     Mean    SD   Naive SE  MCSE     ESS   
 bp 1.5337 3.6815   0.0672 0.0886 1728.4017
bpC 0.4149 0.2460   0.0045 0.0038 3000.0000

Quantiles
───────────────────────────────────────────
parameters
      2.5%    25.0%   50.0%  75.0%  97.5% 
 bp -10.7577 -1.0319 1.4902 4.0067 16.9960
bpC  -0.4675  0.2435 0.4137 0.5796  1.3324

Log evidence      = 0.0
Iterations        = 1:3000
Thinning interval = 1
Chains            = 1
Samples per chain = 3000
pooled            = a[1], a[2], a[3], a[4], a[5], a[6], a[7]

Empirical Posterior Estimates
─────────────────────────────────────────────
pooled
       Mean    SD   Naive SE  MCSE     ESS   
a[1] -1.9824 3.6843   0.0673 0.0887 1727.1929
a[2]  9.8785 6.0428   0.1103 0.1973  937.6438
a[3] -2.2879 3.6866   0.0673 0.0899 1681.8172
a[4] -2.2809 3.6940   0.0674 0.0893 1712.3698
a[5] -1.9804 3.6927   0.0674 0.0896 1698.3607
a[6] -1.0489 3.6885   0.0673 0.0885 1736.5395
a[7]  0.5142 3.6860   0.0673 0.0908 1647.0961

Quantiles
─────────────────────────────────────────────
pooled
       2.5%    25.0%   50.0%   75.0%   97.5% 
a[1] -17.1055 -4.4397 -1.9079  0.5685 10.6230
a[2]  -7.1898  5.7305  9.3678 13.3436 34.4317
a[3] -17.9862 -4.7485 -2.2723  0.2305 10.1737
a[4] -17.6815 -4.7411 -2.2326  0.2594 10.6519
a[5] -17.6976 -4.3925 -1.9362  0.5755  9.9957
a[6] -16.6376 -3.5379 -1.0127  1.5214 11.4556
a[7] -14.9577 -1.9840  0.5571  3.0386 12.6625

or, just printing the means of the posterior vector (from another run):

2-element Array{Array{Float64,1},1}:
 [1.3940051607581527, 0.4210166348245524]                                                                                                     
 [-1.844585401458457, 10.05388176750645, -2.1519882289247705, -2.14490281190503, -1.8458711437273487, -0.9082955910437809, 0.6615688096440819]
goedman commented 5 years ago

Tamas, when you say "I could not reproduce the errors above." does that path (with do_stresstest=true) work on your system? Or if you use the LogDensityRejectErrors path, do you see estimates like in my previous post?

tpapp commented 5 years ago

Apologies for not getting back to you about this yet, this is a busy time for me.

I really appreciate this test case. Various other users have reported subtle (or not-so-subtle) maladaptation and bias issues. I have a couple of guesses about this:

  1. bugs in AD, or just numerical corner cases, for which I created https://github.com/tpapp/LogDensityProblems.jl/issues/42,
  2. overly eager dense matrix adaptation, which can be spurious, to be addressed in https://github.com/tpapp/DynamicHMC.jl/issues/35,
  3. a possible bug in the adaptation code,
  4. a possible bug in the tree-building and/or acceptance rate calculation code.

To investigate 3 and 4, I will be working on tpapp/DynamicHMC.jl#30. I ask for your patience in the meantime. The test cases are greatly appreciated.

goedman commented 5 years ago

Thank you Tamas. Absolutely no apologies needed! I simply can't think of anyone contributing as much to the Julia community as you do!

Any time I notice an update to one the various packages I will continue to try it. For now I will always compare the results to at least Turing and Stan. When I started the work on StatisticalRethinking I had a time frame of about 2 years in mind so from that point of view I'm not even halfway.