Mathematical incorrectness of an algorithm

nj-vs-vh commented 3 years ago

Hi!

I have used your code about a year ago and did some tests trying to figure out if it is right for my application -- and found disturbing evidence of its incorrectness. For example, using 2D uncorrelated normal distribution as a both prior and likelihood, but having likelihood distribution shifted at around 1-2 sigmas I've found sampled distribution to have significantly heavier tails than analytically computed posterior.

This stems directly from your cascaded evaluation strategy. Namely, when using it in your intended way (logPfuns={logprior loglike}) you directly interfere with an acceptance algorithm by sometimes rejecting points based on prior only. What if this particular rejected point had a very high likelihood that would outweigh a low prior probability?

I believe you should at least change your documentation, if not remove this feature altogether. This feature can probably be safely used for uniform priors, which essentially set logposterior to -inf when a proposed point lies outside prior distribution bounds. But again, this kind of optimization may be done directly in logposterior function.

Finally, I attach my code, largely based on yours but with cascaded evaluation removed. It's not really ready to be PR'd (I wasn't using version control at the time of writing and just decided to create my own version on top of yours, renaming all the variables as i saw fit and other adjustments not really related to the issue). But if you express any interest in fixing this, I can contribute to your repo directly. It would be really nice to fix the most popular G&W-based MCMC sampler implementation for Matlab.

nj-vs-vh commented 3 years ago

Seems like I cannot attach files here, so I created a temporary repo with the file.

https://github.com/nj-vs-vh/gwmcmc2

nj-vs-vh commented 3 years ago

At first glance it seems like this comment is referring to the same issue.

grinsted commented 3 years ago

Hi

I really dont have much time to work on this. So I would appreciate PRs with fixes. Ofcourse it would be nice to keep the cascading if at all possible as it can improve performance. (priors+cascading is also useful to ensure that you dont send pass on values that would chrash the likelihood function due to div0 or similar.)

Could you provide a minimal example demonstrating the issue - ideally 1d. I tried a simple 1D example mixing two of gaussians. But that seems to work fine:

lognormpdf=@(x,mu,sigma)-0.5*((x-mu)./sigma).^2  -log(sqrt(2*pi).*sigma);
p1 = @(m) lognormpdf(m,0,1);
p2 = @(m) lognormpdf(m,3,2);

m0 = 1;
ball=randn(length(m0),30)*0.1;
mball=bsxfun(@plus,m0,ball);

m=gwmcmc(mball,{p1 p2},200000,'burnin',.3,'stepsize',2);

[F,XI]=ksdensity(m(:,:));

F=F/max(F);

post= p1(XI) + p2(XI);
post = post-max(post);
post = exp(post);
close all
plot(XI,F,XI,post)
legend('gwmcmc','p1*p2')
ylabel('scaled pdf')

grinsted commented 3 years ago

I found that if I separate the two gaussians by many standard deviations then gwmcmc(mball,{p1 p2},...) has a very slow burn-in phase. Could that be the explanation for the heavier tails you find? -In this case I find it is more efficient to avoid the cascade by doing {@(m)p1(m)+p2(m)} instead. - Hmm... I should probably do some testing on cascading and the effective degrees of freedom of the output.

grinsted / gwmcmc

Mathematical incorrectness of an algorithm #8