ssmit1986 / BayesianInference

Wolfram Language application for Bayesian inference and Gaussian process regression
MIT License
37 stars 4 forks source link

stuck in the Cholesky Decomposition #8

Closed spasetto closed 3 years ago

spasetto commented 3 years ago

Hi,

I have a (hopefully) reproducible sequence that crash/stuck the nestedSampling:

The model (a time series + and ODE system) is:

Ts = TimeSeries[{{33., 1.}, {75., 1.}, {106., 1.}, {165., 1.}, {194., 
     1.}, {221., 0.}, {256., 0.}, {279., 0.}, {312., 0.}, {341., 
     0.}, {379., 0.}, {428., 1.}, {456., 1.}, {484., 1.}, {526., 
     1.}, {561., 1.}, {603., 1.}, {638., 1.}, {666., 1.}, {694., 
     1.}, {722., 0.}, {750., 0.}, {792., 0.}, {834., 1.}, {876., 
     1.}, {914., 1.}, {932., 1.}, {960., 1.}, {1009., 1.}, {1051., 
     1.}, {1093., 1.}, {1135., 1.}, {1184., 1.}, {1226., 1.`}}, 
   ResamplingMethod -> {"Interpolation", "InterpolationOrder" -> 0}];

Model[p1_?NumberQ, p2_?NumberQ, p3_?NumberQ, p4_?NumberQ, p5_?NumberQ,
   p6_?NumberQ, p7_?NumberQ, p8_?NumberQ, p9_?NumberQ, p10_?NumberQ, 
  ic1_?NumberQ, ic2_?NumberQ, ic3_?NumberQ, tmin_?NumberQ, 
  tmax_?NumberQ 
  ] := (Model[p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, ic1, ic2, ic3, 
     tmin, tmax ] = 
    Block[{NAD, NAI, NAIrr, m11, m12, m13, m21, m22, m23, m31, m32, 
      m33, sol},
     m11[t_] := Ts[t] (p1 - p2) + p2; m12[t_] := (1. - Ts[t]) p3; 
     m13[t_] := 0.; m21[t_] := Ts[t] p4; 
     m22[t_] := Ts[t] (p5 - p6) + p6; m23[t_] := 0.; 
     m31[t_] := Ts[t] p7; m32[t_] := Ts[t] p8; 
     m33[t_] := Ts[t] (p9 - p10) + p10;
     sol = 
      First[NDSolve[{NAD'[\[Tau]] == 
          m11[\[Tau]] NAD[\[Tau]] + m12[\[Tau]] NAI[\[Tau]] + 
           m13[\[Tau]] NAIrr[\[Tau]], 
         NAI'[\[Tau]] == 
          m21[\[Tau]] NAD[\[Tau]] + m22[\[Tau]] NAI[\[Tau]] + 
           m23[\[Tau]] NAIrr[\[Tau]], 
         NAIrr'[\[Tau]] == 
          m31[\[Tau]] NAD[\[Tau]] + m32[\[Tau]] NAI[\[Tau]] + 
           m33[\[Tau]] NAIrr[\[Tau]], NAD[tmin] == ic1, 
         NAI[tmin] == ic2, NAIrr[tmin] == ic3}, {NAD, NAI, 
         NAIrr}, {\[Tau], tmin, tmax}]];
     Function[NAD[#] + NAI[#] + NAIrr[#]] /. sol]) // Quiet

I define the problem with your code:

SeedRandom[1234]

obj = defineInferenceProblem[
  "Data" -> data,
  "GeneratingDistribution" -> 
   NormalDistribution[
    Model[10^lg10p1, 10^lg10p2, 10^lg10p3, 10^lg10p4, 10^lg10p5, 10^
      lg10p6, 10^lg10p7, 10^lg10p8, 10^lg10p9, 10^lg10p10, 10^lg10ic1,
       10^lg10ic2, 10^lg10ic3, 33., 1230.][t], \[Sigma]],
  "Parameters" -> {{lg10p1, -3., 
     1.}, {lg10p2, -7., -1.}, {lg10p3, -2., 
     1.5}, {lg10p4, -3.5, -1.}, {lg10p5, -3., -.5}, {lg10p6, -7., \
-1.}, {lg10p7, -5., -1.}, {lg10p8, -4.5, -1.}, {lg10p9, -6., -1.}, \
{lg10p10, -4., -1.5}, {lg10ic1, -1., 1.}, {lg10ic2, -2., 
     1.}, {lg10ic3, -4., -1.}, {\[Sigma], 0.01, 10}},
  "IndependentVariables" -> {t},
  "PriorDistribution" -> {"LocationParameter", "LocationParameter", 
    "LocationParameter", "LocationParameter", "LocationParameter", 
    "LocationParameter", "LocationParameter", "LocationParameter", 
    "LocationParameter", "LocationParameter", "LocationParameter", 
    "LocationParameter", "LocationParameter", "ScaleParameter"}
  ]

If I just run nestedSampling[obj] with standard values or:

result = 
 nestedSampling[obj, "SamplePoolSize" ->  100 , 
  "MaxIterations" ->  1000, "MinIterations" ->  1, 
  "MonteCarloMethod" ->  Automatic, "MonteCarloSteps" ->  100, 
  "TerminationFraction" ->  0.01, "Monitor" ->  True, 
  "LogLikelihoodMaximum" ->  Automatic, 
  "MinMaxAcceptanceRate" ->   {0.05, 0.95}]

...the code runs about 23ish iterations and then it spits out:

CholeskyDecomposition::herm: The matrix "bla bla bla..." is not Hermitian or real and symmetric.

The code does not proceed anymore.

Any idea how to fix it? Thanks in advance for your kind help Best

ssmit1986 commented 3 years ago

Hi, Thanks for sticking with my code! I'm on holiday now, so I can't promise how soon I can investigate this thoroughly, but at least superficially it seems like something is going wrong outside of my own code since I don't explicitly use CholeskyDecomposition anywhere. Is your model well-defined for all possible prior values? You should probably also consider making sure that Model always spits out something valid, even if NDSolve chokes. My first guess would be that that's the culprit.

Best, Sjoerd

spasetto commented 3 years ago

Hi, Thanks for your always prompt reply! I see your point. My suspect for a nestedSampling issue is simply due to the fact that NonlinearModelFit converges to a solution of the same problem while nestedSampling does not.

Regarding the Cholesky decomposition: it is commonly used on Monte Carlo simulations, or maybe on linear least squares(?). I would be surprised to discover an application of Cholesky dec. to ODE systems (e.g., in some NDSolve method)... but I may be wrong. Even when making Model more robust (e.g, taking ComplexExpand[Re[NAD[#] + NAI[#] + NAIrr[#]]], or similar) the Cholesky decomposition error appears with nestedSampling but never with NonlinearModelFit.

Anyway: please, enjoy your vacations! It is absolutely not urgent! Thanks a lot for your time! (belated) Merry Christmas and a happy (and safe) new 2021!

Best

Stefano

ssmit1986 commented 3 years ago

Hi Stefano,

I agree, It's very likely that the cholesky decomposition happens somewhere in the internal MCMC code that I'm referencing (see also the MCMC section of my example notebook). I'm not surprised that it works better with NonlinearModelFit, since that tends to search out only a small region of a very big parameter space. Nested sampling will start by sampling the whole prior space since it's a sampler and not an optimizer, so many more parameter values will be tried and some of those may upset NDSolve. I suspect the problem is that in the line

Function[NAD[#] + NAI[#] + NAIrr[#]] /. sol])

you assume that sol will be a list of rules. But what about when NDSolve fails completely and doesn't output a list of rules? I would suggest something like:

If[ MatchQ[sol, {__Rule}], 
    Function[NAD[#] + NAI[#] + NAIrr[#]] /. sol,
    Function[0]
]

or whatever default value is appropriate. Or perhaps it would be even better to define your own LogLikelihoodFunction that returns $MachineLogZero if NDsolve doesn't find a solution at all. I see you're also Quieting everything in Model, which is usually a bad sign. You may want to look into Check to catch situations where NDSolve complains, since those results are suspect as well.

P.S. Your code in the top post has some syntax issues with the \[Tau] glyph that prevent me from copying the code directly.

spasetto commented 3 years ago

Hi, Thanks for your suggestion. Please, apologize for the quality of the code I passed you (//Quiets, etc...). It is definitely "sloppy" because intended to evidence the problem with the Cholesky decomposition crashing your Bayesian nestedSampling code. The original code that I tried to implement in Mathematica is, as you sensed, more robust (for many different reasons) and is attached below.

Note that it adds some extra compliances that are unnecessary to you for the debugging of your code as: -the code is "coated" in a Block ModelCoated that makes it robust to all the spanned parameter space and adds some physical constraints of interest, -the system is mathematical well-posed (no divergences, no fractions, no singularities, nothing) -different implementation of nested sampling algorithms (C++ & Python) do converge to the same correct result, -in the Model I added your suggestion and omitted the [Tau] causing you problems. Please let me know if you need further inputs.

All these points to suggest a bug in the implementation of the nested sampling algorithms (or in Mathematica's algorithms you used?). Nevertheless, your implementations remain one of the more efficient implementations of the Bayesian inference in Mathematica, if not the only one implementing nested Sampling algorithm. Finally, my intention was just to point out a bug, not to bother your vacations! ;-)

ThX,

Best

Stefano

tp = {0.`, 33.`, 75.`, 106.`, 165.`, 194.`, 221.`, 256.`, 279.`, 
   312.`, 341.`, 379.`, 428.`, 456.`, 484.`, 526.`, 561.`, 603.`, 
   638.`, 666.`, 694.`, 722.`, 750.`, 792.`, 834.`, 876.`, 914.`, 
   932.`, 960.`, 1009.`, 1051.`, 1093.`, 1135.`, 1184.`, 1226.`};
TOnOffp = {1.`, 1.`, 1.`, 1.`, 1.`, 1.`, 0.`, 0.`, 0.`, 0.`, 0.`, 0.`,
    1.`, 1.`, 1.`, 1.`, 1.`, 1.`, 1.`, 1.`, 1.`, 0.`, 0.`, 0.`, 1.`, 
   1.`, 1.`, 1.`, 1.`, 1.`, 1.`, 1.`, 1.`, 1.`, 1.`};

TSharp[t_] := 
  Boole[IntervalMemberQ[
    Chop[Apply[IntervalUnion, 
      Map[Interval, 
       Part[Map[MinMax, 
         TakeList[tp, Map[Length, Split[TOnOffp], {1}]]], 
        1 ;; All ;; 2]]]], t]];
Ts = TimeSeries[Table[{t, TSharp[t]}, {t, 1, Last[tp], 2}]];

tmin = Ts["FirstTime"];
tmax = Ts["LastTime"];

Model[p1_?NumberQ, p2_?NumberQ, p3_?NumberQ, p4_?NumberQ, p5_?NumberQ,
   p6_?NumberQ, p7_?NumberQ, p8_?NumberQ, p9_?NumberQ, p10_?NumberQ, 
  ic1_?NumberQ, ic2_?NumberQ, ic3_?NumberQ, tmin_?NumberQ, 
  tmax_?NumberQ 
  ] := (Model[p1, p2, p3, p4, p5, p6, p7, p8, p9, p10, ic1, ic2, ic3, 
    tmin, tmax ] = 
   Block[{NAD, NAI, NAIrr, m11, m12, m13, m21, m22, m23, m31, m32, 
     m33, sol},
    m11[t_] := Ts[t] (p1 - p2) + p2; m12[t_] := (1. - Ts[t]) p3; 
    m13[t_] := 0.; m21[t_] := Ts[t] p4; 
    m22[t_] := Ts[t] (p5 - p6) + p6; m23[t_] := 0.; 
    m31[t_] := Ts[t] p7; m32[t_] := Ts[t] p8; 
    m33[t_] := Ts[t] (p9 - p10) + p10;
    sol = 
     First[NDSolve[{NAD'[t1] == 
         m11[t1] NAD[t1] + m12[t1] NAI[t1] + m13[t1] NAIrr[t1], 
        NAI'[t1] == 
         m21[t1] NAD[t1] + m22[t1] NAI[t1] + m23[t1] NAIrr[t1], 
        NAIrr'[t1] == 
         m31[t1] NAD[t1] + m32[t1] NAI[t1] + m33[t1] NAIrr[t1], 
        NAD[tmin] == ic1, NAI[tmin] == ic2, NAIrr[tmin] == ic3}, {NAD,
         NAI, NAIrr}, {t1, tmin, tmax}]];
    If[MatchQ[sol, {__Rule}],
     Function[NAD[#] + NAI[#] + NAIrr[#]] /. sol,
     Function[0]]
    ])

ModelCoated[
  p1_?NumberQ, p2_?NumberQ, p3_?NumberQ, p4_?NumberQ, p5_?NumberQ, 
  p6_?NumberQ, p7_?NumberQ, p8_?NumberQ, p9_?NumberQ, p10_?NumberQ, 
  ic1_?NumberQ, ic2_?NumberQ, ic3_?NumberQ,
  t_?NumberQ, tmin_?NumberQ, tmax_?NumberQ ] := Block[
  {val},
  val = If[
    p3 > 0.0 \[And] p4 > 0.0 \[And] p7 > 0.0 \[And] p8 > 0.0 \[And] 
     ic1 > 0.0 \[And] ic2 > 0.0 \[And] ic3 > 0.0 \[And] 
     p9 >= 0.0 \[And]
     tmin < t < tmax,
    ComplexExpand[Re[
      Model[
        -p1, p2, p3,
        p4, -p5, p6,
        p7, p8, -p9, p10,
        ic1, ic2, ic3,
        tmin, tmax
        ][t]]],
    0
    ];
  If[0 < val < 1000., val, 0]
  ]

data = {33. -> 11.83, 75. -> 5.02, 106. -> 4.87, 165. -> 5.1, 
   194. -> 2.86, 221. -> 2.04, 256. -> 2.29, 279. -> 3.44, 
   312. -> 3.47, 341. -> 5.52, 379. -> 6.83, 428. -> 10.62, 
   456. -> 21.2, 484. -> 9.75, 526. -> 9.64, 561. -> 8.52, 
   603. -> 7.61, 638. -> 7.43, 666. -> 5.95, 694. -> 6.8, 
   722. -> 5.64, 750. -> 7.06, 792. -> 7.55, 834. -> 12.89, 
   876. -> 12.6, 914. -> 11.32, 932. -> 11.45, 960. -> 13., 
   1009. -> 12.85, 1051. -> 14.32, 1093. -> 14.31, 1135. -> 12.65, 
   1184. -> 14.52, 1226. -> 15.78};

obj = defineInferenceProblem[
  "Data" -> data,
  "GeneratingDistribution" -> 
   NormalDistribution[
    ModelCoated[10^lg10p1, 10^lg10p2, 10^lg10p3, 10^lg10p4, 10^lg10p5,
      10^lg10p6, 10^lg10p7, 10^lg10p8, 10^lg10p9, 10^lg10p10, 10^
     lg10ic1, 10^lg10ic2, 10^lg10ic3, t, tmin, tmax], \[Sigma]],
  "Parameters" -> {{lg10p1, -3., 
     1.}, {lg10p2, -7., -1.}, {lg10p3, -2., 
     1.5}, {lg10p4, -3.5, -1.}, {lg10p5, -3., -.5}, {lg10p6, -7., \
-1.}, {lg10p7, -5., -1.}, {lg10p8, -4.5, -1.}, {lg10p9, -6., -1.}, \
{lg10p10, -4., -1.5}, {lg10ic1, -1., 1.}, {lg10ic2, -2., 
     1.}, {lg10ic3, -4., -1.}, {\[Sigma], 0.01, 10}},
  "IndependentVariables" -> {t},
  "PriorDistribution" -> {"LocationParameter", "LocationParameter", 
    "LocationParameter", "LocationParameter", "LocationParameter", 
    "LocationParameter", "LocationParameter", "LocationParameter", 
    "LocationParameter", "LocationParameter", "LocationParameter", 
    "LocationParameter", "LocationParameter", "ScaleParameter"}
  ]

result = 
 nestedSampling[obj, "SamplePoolSize" ->  100 , 
  "MaxIterations" ->  1000, "MinIterations" ->  100, 
  "MonteCarloMethod" ->  Automatic, "MonteCarloSteps" ->  100, 
  "TerminationFraction" ->  0.01, "Monitor" ->  True, 
  "LogLikelihoodMaximum" ->  Automatic, 
  "MinMaxAcceptanceRate" ->   {0.05, 0.95}]
ssmit1986 commented 3 years ago

Thank you for the elaboration. In that case: have you tried sampling the model using createMCMCChain and iterateMCMC? to see if that gives you the same Cholesky decomposition messages? Those functions are just wrappers for calling built-in functionality that I didn't write, so if the problem lies there I can see if I can pass the problem on to the developers.

spasetto commented 3 years ago

Hi, not really any problem with createMCMCChain and iterateMCMC... actually, the convergence to reasonable values is pretty good and fast. Try e.g.,

chain = createMCMCChain[
  obj, {-1.`, -4.`, -0.25`, -2.25`, -1.75`, -4.`, -3.`, -2.75`, \
-3.5`, -2.75`, 0.`, -0.5`, -2.5`, 5.005`}];
iterateMCMC[chain, 10000];
samples = iterateMCMC[chain, 1000];

...no problem. :-/

ssmit1986 commented 3 years ago

Ok, so that rules out a number of things, but it's good to hear that the MCMC sampler works at least. I guess I'll have to dive under the hood and see what I can come up with. I'll let you know when I've got something. In the mean while I can only suggest playing around with the options. Maybe the "SamplePoolSize" -> 100 setting is not enough? You're saying the convergence is fast, so it's possible the likelihood contours change so fast it confuses the sampler. And maybe the "MinIterations" -> 100 setting can be reduced as well in that case.

ssmit1986 commented 3 years ago

I just pushed a change that might be the solution to this problem. I couldn't actually reproduce your problem directly, but this seems like the most likely culprit to me. If I deliberately give a non-symmetric covariance estimate to the MCMC sampler, I do get the errors you describe, so hopefully this should should help avoid those errors. If it's not good enough, I guess we need a better way to estimate the covariance for each new sample.

spasetto commented 3 years ago

Hi & Happy New Yr. Let me share immediately that I run a few tests over the new yr and I collect them tonight: all tests with SamplePoolSize greater than 200 do converge without Cholesky decomp. error! ...I'm a little puzzled (but happy to see that your code works). What is the connection with the SamplePoolSize to Cholesky Decomp? What is then the meaning of SamplePoolSize? In my understanding, it was just related to the number of points with the highest prob?? Correct? I'm going to test your change with SamplePoolSize lower than 200 over the next couple of days or so... but I thought to share the info as soon as I've seen it. Ste

ssmit1986 commented 3 years ago

Happy new year to you as well, thanks! Glad to hear it's working with a larger sample pool. The sample pool size is a parameter that controls the convergence speed vs. accuracy trade-off. With each iteration of the algorithm, the likelihood boundary of the sample space shrinks and the more less sample points you have, the more this boundary contracts. So if the sample pool size is too small, the region will contract a lot, making it difficult to figure estimate where the new boundary is. I think that's what's going on.

BTW, I also suspect that you can make the algorithm quite a bit faster by using ParametricNDSolveValue, since that will pre-compile some of the work that NDSolve now needs to do for every single likelihood evaluation. In addition, if you only need NAD[t] + NAI[t] + NAIrr[t], you can tell NDSolve (and related functions) to directly return that to you. For example:

NDSolveValue[{x'[t] == -y[t] - x[t]^2, y'[t] == 2 x[t] - y[t]^3,   x[0] == y[0] == 1}, 
  x[20] + y[20],
  {t, 20}
]

I find that much easier than mucking around with replacement rules of interpolation functions that NDSolve normally gives.

spasetto commented 3 years ago

Hi, Solved! I've tested the changes you pushed and the code stably reaches the end of every simulation, even with the smaller SamplePoolSize that previously caused the Cholesky decomposition stuck. Great! Thanks for fixing it!

With respect to the ParametricNDSolveValue: it is 3 to 4 times slower than caching the solution of the ODE as I proposed (no matter the parametrichaching/sensitivity options you can use). But, just because you mentioned the performances: I also didn't find extremely beneficial the parallelNestedSampling. It is scaling poorly, with overhead dominant already at 8 cores for some codes... any advice on its use? Did you try using ParallelSubmit?

ssmit1986 commented 3 years ago

Glad to hear that that solved the problem. I hope you can solve your problem now :).

As for ParametricNDSolveValue: hmmm, that's odd. Maybe its caching mechanism isn't as efficient. You could try to combine it with ordinary caching with something like:

With[{
  sol = ParametricNDSolveValue[
    {x'[t] == -y[t] - x[t]^2,   y'[t] == a x[t] - y[t]^3, x[0] == y[0] == 1}, 
    x[20] + y[20], {t, 20}, {a}]
  },
  model[args___] := model[args] = sol[args]
]

That way you're using the same caching mechanism as you're current model so the comparison is a bit easier.

As for parallelNestedSampling: the first thing you need to keep in mind is that if you want to speed up the convergence, you'll need to reduce the "SamplePoolSize" option by the same factor as the number of "ParallelRuns". So if you're distributing over 4 runs, you need to use 1/4th of the number of living points (so 25 instead of 100, for example). That speeds up the convergence rate of each individual run and they will then be re-combined into a single run of 100 points.

Other than that: all the usual caveats of parallelization apply: you need to make sure that all your function definitions get parallelized correctly, for example. Otherwise the parallel kernels will just spit unevaluated code back out that the master kernel then has to evaluate. This is the most frequent mistake people tend to make with parallel functions in WL. Another one is over-reliance on shared variables. In particular, if you're using function caching, you shouldn't use SetSharedVariable on that function, since that will cause a lot of communication overhead.

So the first thing you need to try, is to see if you can use your model efficiently with ParallelTable and work out all the kinks there. You can pass all options of ParallelTable into parallelNestedSampling and they will then be automatically propagated.

spasetto commented 3 years ago

Thanks for the interesting advice! I'll have a try with the "double caching" approach, and now with your code working on small samplepoolsize, it should be easier to test parallelNestedSampling.

All the best!

Ste