Diagnostics module design

MichaelClerx commented 3 years ago

We have a private module pints._diagnostics that declares public and private methods, some of the public methods are exposed to the user as part of the public API.

However, the test for diagnostics imports the hidden module directly and tests methods not visible to the user.

Maybe this is fine, but usually it's a sign that we need to redesign a bit?

I'd propose making the diagnostics module public, so that pints.effective_sample_size becomes e.g. pints.diagnostics.effective_sample_size. Then we can make other methods, e.g. autocorrelation public too, without cluttering the pints namespace.

We could also imagine having them under pints.mcmc, but at the moment that only contains MCMCSamplers (even the MCMCController is in pints)

Thoughts @ben18785 @DavAug ?

ben18785 commented 3 years ago

Sounds like a plan. I'd keep the diagnostics outside of pints.mcmc as can imagine having diagnostics for other methods later on (and it's annoying-er to do import pints.mcmc.diagnostics).

On Tue, Feb 9, 2021 at 2:55 PM Michael Clerx notifications@github.com wrote:

We have a private module pints._diagnostics that declares public and private methods, some of the public methods are exposed to the user as part of the public API.

However, the test for diagnostics imports the hidden module directly and tests methods not visible to the user.

Maybe this is fine, but usually it's a sign that we need to redesign a bit?

I'd propose making the diagnostics module public, so that pints.effective_sample_size becomes e.g. pints.diagnostics.effective_sample_size. Then we can make other methods, e.g. autocorrelation public too, without cluttering the pints namespace.

We could also imagine having them under pints.mcmc, but at the moment that only contains MCMCSamplers (even the MCMCController is in pints)

Thoughts @ben18785 https://github.com/ben18785 @DavAug https://github.com/DavAug ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKABDCDRRHHPL2JU2JDS6FEGDANCNFSM4XLERGWA .

MichaelClerx commented 3 years ago

@DavAug are you happy to pick this up? You seem to be into diagnostics at the moment 😄

DavAug commented 3 years ago

Yep no problem. But before I do this @ben18785 are we still planning on outsourcing all of this to arviz? Because then it may not be necessary to implement a diagnostics module?

Get Outlook for iOShttps://aka.ms/o0ukef

From: Michael Clerx notifications@github.com Sent: Tuesday, February 9, 2021 4:30:38 PM To: pints-team/pints pints@noreply.github.com Cc: David Augustin david.augustin@cs.ox.ac.uk; Mention mention@noreply.github.com Subject: Re: [pints-team/pints] Diagnostics module design (#1283)

@DavAughttps://github.com/DavAug are you happy to pick this up? You seem to be into diagnostics at the moment 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pints-team/pints/issues/1283#issuecomment-776024919, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEY2T3XVRGUPKKROPSG6SLTS6FIJ5ANCNFSM4XLERGWA.

ben18785 commented 3 years ago

So, I think we still want our existing diagnostics methods but that any new ones come from Arviz...

On Tue, Feb 9, 2021 at 3:47 PM David Augustin notifications@github.com wrote:

Yep no problem. But before I do this @ben18785 are we still planning on outsourcing all of this to arviz? Because then it may not be necessary to implement a diagnostics module?

Get Outlook for iOShttps://aka.ms/o0ukef

From: Michael Clerx notifications@github.com Sent: Tuesday, February 9, 2021 4:30:38 PM To: pints-team/pints pints@noreply.github.com Cc: David Augustin david.augustin@cs.ox.ac.uk; Mention < mention@noreply.github.com> Subject: Re: [pints-team/pints] Diagnostics module design (#1283)

@DavAughttps://github.com/DavAug are you happy to pick this up? You seem to be into diagnostics at the moment 😄

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/pints-team/pints/issues/1283#issuecomment-776024919>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEY2T3XVRGUPKKROPSG6SLTS6FIJ5ANCNFSM4XLERGWA>.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283#issuecomment-776037278, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKHHT5HJ726RJFQFRWDS6FKJNANCNFSM4XLERGWA .

DavAug commented 3 years ago

I am modifying the docstrings a little bit to let the users know in more detail which assumptions are made for the ESS and autocorrelation computation. Following the stan discussion we are currently actually using a somewhat less motivated truncation criterion for the ESS computation, see https://discourse.mc-stan.org/t/n-eff-bda3-vs-stan/2608/20.

In brief, currently we only keep the correlations up to the first negative correlation. Apparently there is no theoretic justification for this, instead one should consider pairs of correlations, i.e. (sum of correlation at lag 0 and 1, at lag 2 and 3, at lag 4 and 5 etc.). The first negative pair should be used as truncation criterion.

I would like to make changes according to stan and just wanted to check with you whether there was a theoretical justification for the current implementation, @ben18785 @MichaelClerx @chonlei @rcw5890 ?

DavAug commented 3 years ago

@MichaelClerx , I have started to move diagnostics functions into a separate diagnostics module. How does the deprecation work for those functions? Or is it ok to just delete them from the pints namespace?

MichaelClerx commented 3 years ago

@DavAug I usually do something like this:

import warnings

def method_that_I_want_to_remove_now(self):
    # Deprecated on 2021-02-14
    warnings.warn('pints.method_that_I_want_to_remove_now is deprecated and will be removed in future versions of Pints')
    <original code here>

def method_with_new_name(self):
    <method code here>

def method_with_old_name(self, x):
    # Deprecated on ...
    warnings.warn('pints.method_with_old name is deprecated and will be removed in future versions of Pints, please use method_with_new_name() instead.')
    return method_with_new_name(x)

ben18785 commented 3 years ago

Hi David, happy for you to make those changes if you like. Perhaps it'd just be better though to move wholesale to Arviz? There are a host of changes to ESS that are possible (your proposed is one of them), so rather than reinvent the wheel, I think just moving to Arviz makes sense.

On Sun, Feb 14, 2021 at 11:14 AM David Augustin notifications@github.com wrote:

I am modifying the docstrings a little bit to let the users know in more detail which assumptions are made for the ESS and autocorrelation assumptions. Following the stan discussion we are currently actually using a somewhat less motivated truncation criterion for the ESS computation, see https://discourse.mc-stan.org/t/n-eff-bda3-vs-stan/2608/20.

In brief, currently we only keep the correlations up to the first negative correlation. Apparently there is no theoretic justification for this, instead one should consider pairs of correlations, i.e. (sum of correlation at lag 0 and 1, at lag 2 and 3, at lag 4 and 5 etc.). The first negative pair should be used as truncation criterion.

I would like to make changes according to stan and just wanted to check with you whether there was a theoretical justification for the current implementation, @ben18785 https://github.com/ben18785 @MichaelClerx https://github.com/MichaelClerx @chonlei https://github.com/chonlei @rcw5890 https://github.com/rcw5890 ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283#issuecomment-778762964, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKGNAOUKZNFYM3SMHS3S66WBRANCNFSM4XLERGWA .

DavAug commented 3 years ago

I agree hundred percent that arviz does everything we want (except maybe the multivariate rhat). I thought however that the current metrics were supposed to be moved? If that’s the case we might want to fix the bug in ESS because the current truncation seems to be wrong / arbitrary.

Get Outlook for iOShttps://aka.ms/o0ukef

From: ben18785 notifications@github.com Sent: Sunday, February 14, 2021 6:36:58 PM To: pints-team/pints pints@noreply.github.com Cc: David Augustin david.augustin@cs.ox.ac.uk; Mention mention@noreply.github.com Subject: Re: [pints-team/pints] Diagnostics module design (#1283)

Hi David, happy for you to make those changes if you like. Perhaps it'd just be better though to move wholesale to Arviz? There are a host of changes to ESS that are possible (your proposed is one of them), so rather than reinvent the wheel, I think just moving to Arviz makes sense.

On Sun, Feb 14, 2021 at 11:14 AM David Augustin notifications@github.com wrote:

I am modifying the docstrings a little bit to let the users know in more detail which assumptions are made for the ESS and autocorrelation assumptions. Following the stan discussion we are currently actually using a somewhat less motivated truncation criterion for the ESS computation, see https://discourse.mc-stan.org/t/n-eff-bda3-vs-stan/2608/20.

In brief, currently we only keep the correlations up to the first negative correlation. Apparently there is no theoretic justification for this, instead one should consider pairs of correlations, i.e. (sum of correlation at lag 0 and 1, at lag 2 and 3, at lag 4 and 5 etc.). The first negative pair should be used as truncation criterion.

I would like to make changes according to stan and just wanted to check with you whether there was a theoretical justification for the current implementation, @ben18785 https://github.com/ben18785 @MichaelClerx https://github.com/MichaelClerx @chonlei https://github.com/chonlei @rcw5890 https://github.com/rcw5890 ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283#issuecomment-778762964, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKGNAOUKZNFYM3SMHS3S66WBRANCNFSM4XLERGWA .

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/pints-team/pints/issues/1283#issuecomment-778811288, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEY2T3UVNWICCI2T5XLJHJLS7AC3VANCNFSM4XLERGWA.

ben18785 commented 3 years ago

Hi David: it's not actually a bug, just the old definition of how this was handled for ESS. So, think it's ok to leave as is rather than update to a newer method for calculating it.

On Sun, Feb 14, 2021 at 5:41 PM David Augustin notifications@github.com wrote:

I agree hundred percent that arviz does everything we want (except maybe the multivariate rhat). I thought however that the current metrics were supposed to be moved? If that’s the case we might want to fix the bug in ESS because the current truncation seems to be wrong / arbitrary.

Get Outlook for iOShttps://aka.ms/o0ukef

From: ben18785 notifications@github.com Sent: Sunday, February 14, 2021 6:36:58 PM To: pints-team/pints pints@noreply.github.com Cc: David Augustin david.augustin@cs.ox.ac.uk; Mention < mention@noreply.github.com> Subject: Re: [pints-team/pints] Diagnostics module design (#1283)

Hi David, happy for you to make those changes if you like. Perhaps it'd just be better though to move wholesale to Arviz? There are a host of changes to ESS that are possible (your proposed is one of them), so rather than reinvent the wheel, I think just moving to Arviz makes sense.

On Sun, Feb 14, 2021 at 11:14 AM David Augustin notifications@github.com wrote:

I am modifying the docstrings a little bit to let the users know in more detail which assumptions are made for the ESS and autocorrelation assumptions. Following the stan discussion we are currently actually using a somewhat less motivated truncation criterion for the ESS computation, see https://discourse.mc-stan.org/t/n-eff-bda3-vs-stan/2608/20.

In brief, currently we only keep the correlations up to the first negative correlation. Apparently there is no theoretic justification for this, instead one should consider pairs of correlations, i.e. (sum of correlation at lag 0 and 1, at lag 2 and 3, at lag 4 and 5 etc.). The first negative pair should be used as truncation criterion.

I would like to make changes according to stan and just wanted to check with you whether there was a theoretical justification for the current implementation, @ben18785 https://github.com/ben18785 @MichaelClerx https://github.com/MichaelClerx @chonlei https://github.com/chonlei @rcw5890 https://github.com/rcw5890 ?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/pints-team/pints/issues/1283#issuecomment-778762964 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ABCILKGNAOUKZNFYM3SMHS3S66WBRANCNFSM4XLERGWA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub< https://github.com/pints-team/pints/issues/1283#issuecomment-778811288>, or unsubscribe< https://github.com/notifications/unsubscribe-auth/AEY2T3UVNWICCI2T5XLJHJLS7AC3VANCNFSM4XLERGWA

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pints-team/pints/issues/1283#issuecomment-778812055, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABCILKAUIAHZ4GOQAXOD7ZDS7ADN7ANCNFSM4XLERGWA .

DavAug commented 3 years ago

Hi David: it's not actually a bug, just the old definition of how this was handled for ESS. So, think it's ok to leave as is rather than update to a newer method for calculating it.

Ah ok. Would you mind if I changed it already?

According to the discussion in the Stan forum above, it seems that Stan's "old" version was not very well justified. They argue specifically for the NUTS sampler.

"Exact autocorrelations can happen only on odd lags (Geyer, 2011). By summing over pairs, the paired autocorrelation is guaranteed to be positive modulo estimator noise. This is the motivation behind the many termination criterion of Geyer (2011). Stan does not (yet) do the paired expectations because NUTS almost by construction avoids the negative autocorrelation regime. Thus terminating at the first negative autocorre- lation is a reasonable approximation for stopping when the noise in the autocorrela- tion estimator dominates."

According to Vehtari that doesn't seem to have a sound theoretical justification, but Geyer's original truncation criterion does.

"I’m trying to compare neff computation described in BDA3 and the implemented code, and I think there is a small difference. The comment in the beginning says the implementation matches BDA3. BDA3 (at least version 20150505) describes truncation of the autocorrelation series using Geyer’s initial positive sequence (Geyer, 1992) where truncation “T is the first odd positive integer for which $\widehat{\rho}{T+1}+\widehat{\rho}{T+2}$ is negative.”. If I understand the code (stan/src/stan/mcmc/chains.hpp) it’s using “T is the last positive integer for which \widehat{\rho}{T} is positive.”, that is, it’s not using the sum of odd and even lags for which Geyer had a theoretical argument."

So I would argue to adapt to Stan (also given that I've implemented it already) :D

DavAug commented 3 years ago

The soon to be deprecated pints.effective_sample_size will still be computing the effective sample size as we had it before, so it wouldn't change anything for other scripts that have used this estimate.

ben18785 commented 3 years ago

Yep, fine if you’ve already done it — thanks!

On 14 Feb 2021, at 18:46, David Augustin notifications@github.com wrote:

The soon to be deprecated pints.effective_sample_size will still be computing the effective sample size as we had it before, so it wouldn't change anything for other scripts that have used this estimate.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

DavAug commented 3 years ago

I have created a diagnostics module with the following functions

[x] pints.diagnostics.autocorrelation
[x] pints.diagnostics.effective_sample_size
[x] pints.diagnostics.multivariate_rhat
[x] pints.diagnostics.rhat

All functions can handle multiple chains with multiple parameters.

[x] The effective sample size now uses the Geyer criterion as discussed above.
[x] The effective sample size can now also deal with multiple chains and approximate ESS according to Vehtari et al (goes beyond "simple" addition of ESS from chains)

DavAug commented 3 years ago

Open questions:

Do we want to move the MCMCSummary as well into the diagnostics? I.e. deprecate it in the pints namespace and make it available in pints.diagnostics
Do we want to change the MCMCSummary's ESS estimation to the now implemented ESS estimation across multiple chains following Vehtari et al (both is possible with the new ESS implementation)
Does it make sense to also move plots that are specifically for diagnosting into the diagnostics namespace?

ben18785 commented 3 years ago

Thanks @DavAug

In response to your questions:

I don't mind. It probably makes most sense for it to go in the diagnostics module.
Yes, if it's already implemented then let's switch.
I'd keep plots where they are for now because I can see arguments either way (and it's less work to keep them where they are).

MichaelClerx commented 2 years ago

@DavAug this one looks quite finished to me?

DavAug commented 2 years ago

@DavAug this one looks quite finished to me?

Oh boy, I haven't looked at this for a long time. If I remember correctly, I didn't quite finish the testing, because it wasn't quite obvious to me at the time how to do it in a principled way. The diagnostics module itself should be ready though.

MichaelClerx commented 2 years ago

Probably best to skip the principled bit, and just compare the output to a hardcoded result?

pints-team / pints

Diagnostics module design #1283