Use the new autocorr() function in emcee?

Gabriel-p commented 6 years ago

I've been doing some tests an the new autocorr() function in emcee returns times that are much shorter than the ones estimated by the autocorr_integrated_time() function currently in this package.

I have to use a window size a lot smaller (around 18-19) to get similar results to those given by the new autocorr(), which performs an automatic window search.

Here's an example of how I test both these estimates.

from .emcee3rc1 import autocorr
for i, result in enumerate(ptsampler.sample(p, iterations=nsteps, adapt=True)):

    # Autocorr for the cold chain
    tau_old = ptsampler.get_autocorr_time()[0]

    # Pass only the number of processed steps, and reshape the cold chain
    # to match the shape required by `integrated_time()`
    tau_new = autocorr.integrated_time(ptsampler.chain[0].transpose(1, 0, 2), tol=0)

cdcapano commented 6 years ago

@Gabriel-p I did some tests awhile back with emcee_pt using emcee's new autocorr function, and I also got much shorter autocorrelation times (ACLs). However, the times I got back were too short, at least if you have a bimodal distribution. I judged that they were too short for a couple of reasons: first, when using the new ACL, I found that many walkers had repeated values in their thinned samples. Second, even after running for over 50 ACLs (as reported by the new method), the sampler was not burned in. With the old method, ensuring that the second half of the chains was longer than 5 ACLs was sufficient.

These tests were done with emcee_pt, not ptemcee, but I'm not sure the changes the that have been introduced since would make a difference. I'm not sure why the new autocorrelation method didn't work, but I suspect it has something to do with the temperature swapping.

Anyway, the developer's should comment, but my experience is no, don't switch to the new autocorrelation method.

Gabriel-p commented 5 years ago

Thanks for your comment @cdcapano. I've actually been doing some tests and I've found that the behaviour of the autocorrelation time is very strange with both methods. While emcee's method returns smaller values (not always, see 1000 runs plot below), both show the same behaviour no matter how many runs I use.

Below is the output for a single parameter model, for 100, 500, and 1000 runs. The top plot is the autocorr time obtained with emcee and ptemcee (obtained as shown in the first comment of the thread) and below is the trace for the 12 chains I used. In both cases I only plot results for the cold (zero index) temperature.

See what the autocorr times do? They show almost the exact same shape no matter how many runs I use: grows, then stabilizes, then at the end drops and finally an upwards spike.

This is very strange and I can't seem to find anything wrong with my code.

100 runs

tau_pt_vs_em_100

500 runs

tau_pt_vs_em_500

1000 runs

tau_pt_vs_em_1000

willvousden / ptemcee