Closed segasai closed 3 years ago
Apparently increasing the number slices to 20 helps reducing the bias
In [2]: rstate=np.random.RandomState(33);co=test_highdim.Config(rstate,32);X=test_highdim.do_gaussian(co,sample='rslice',bound='multi',nlive=500,rstate=rstate)
In [3]: X
Out[3]: (-232.03421701937455, 1.0847506259228943)
In [4]: co.logz
Out[4]: -243.22887870534663
So I guess the bias is caused by correlated samples. I'm a bit surprised, as I'd thought slice sampling inside the ellipse should mix really well, but I guess in high-D spaces, that's not true anymore.
I think I've mentioned this in a thread a while back, but this makes sense to me: the autocorrelation time for slice sampling scales as O(d)-ish (I think the PolyChord paper actually states this somewhere explicitly) and correlated samples lead to an overestimate of the prior volume compression and therefore an overestimate of the logZ (i.e. the positive bias you've been seeing). In slice
, because you iterated through all the dimensions each time, this was implicitly taken into account, But in rslice
, like in rwalk
, you generally should be setting the number of steps to scale at least somewhat with dimensionality.
There's the additional issue of more live points giving better bounding distributions and therefore better proposals in the whitened space, but I think you've found at least for very small rslice
that this is a subdominant effect to increasing the number of steps.
Tentatively closing this issue for now.
I think that asks for changing the defaults... (as mentioned in other issue, in my opinion the defaults should be ideally the most sensible possible)
This the test with nslice=5 (old) nslice = 5+ndim (new) nslice = 5+.25*ndim ( new0.25) nslice = max(5,ndim) (newmax5)
I'm suggesting to change the default value of slices to max(5,ndim)
This is fine with me. If you want to submit a PR to change the default options to be something like:
3 + ndim
for rslice
3
for slice
(since this includes the ndim scaling; you can also keep this at 5 if you like or whatever appears to work best from tests)20 + ndim
for rwalk
and rstagger
that probably should get most of the defaults in the right place
Ok, I'll do. (I used 5 as a default in my test, because I thought that was a default, but it seems it is 3).
No you're right -- it has been 5, but I think if we're adding ndim
for most problems we can reduce it to 3 without any real problems.
As seen in #285 there is a bias in logz at high dimension count.
I've briefly checked and that seems to be not correlated with the number of live points (see below) This is kind'a suspicious as you'd expect that any numerical/precision issues to be smaller when you increase nlive. The bias is pretty substantial of 50 dex. I'm wondering if this is a manifestation of some kind of a bug somewhere as opposed to inaccuracies.