walkeranthonyp / MAAT

MAAT
GNU General Public License v3.0
12 stars 17 forks source link

MCMC outlier handling #25

Closed abbeyjohnson1 closed 3 years ago

abbeyjohnson1 commented 4 years ago

In a MCMC simulation, when an outlier chain is detected, it is replaced with a randomly chosen chain that isn't an outlier. (And the sample history of the outlier chain is also replaced with the sample history of the chosen chain in .$dataf$pars_array.) When this happens, the statistical balance of the MCMC sampling is perturbed... So after you replace the outlier, you're supposed to "reapply burn-in," which just means that you're supposed to throw out all of the samples before the iteration where the outlier was detected. (You would also need to throw out all the other corresponding stored data -- likelihood, model evals, convergence diagnostic, etc.)

Currently, this is being done by hand in post-processing, but it would be helpful if this could be done automatically to avoid what I call "false convergence." The convergence diagnostic code is called every few iterations and works by taking the last 50% of samples from .$dataf$pars_array to do its computation. So "false convergence" sometimes happens when the convergence diagnostic calculation uses a statistically unbalanced sample history. This is corrected in post-processing by throwing out all history before the outlier replacement and then re-calculating the diagnostic to confirm true convergence, which may or may not have occured.

What is currently happening in the outlier handling code in wrapper_functions_mcmc.R:

mcmc_outlier_iqr <- function(., j) {

    ... skipping a bunch of code ...

    # replace outlier chain(s)
    .$dataf$pars_array[1:.$mcmc$d, outliers, 1:j] <- .$dataf$pars_array[1:.$mcmc$d, replace_idx, 1:j]

    # replace likelihood history for next iqr calculation
    .$dataf$pars_lklihood[outliers, 1:j] <- .$dataf$pars_lklihood[replace_idx, 1:j]
  }

}

Pseudocode of what would ideally be happening:

mcmc_outlier_iqr <- function(., j) {

    ... skipping a bunch of code that does not need to change ...

    # replace outlier chain(s)
    # change 1:j to j
    .$dataf$pars_array[1:.$mcmc$d, outliers, j] <- .$dataf$pars_array[1:.$mcmc$d, replace_idx, j]

    # somehow throw out all previous sample history in pars_array

    # replace likelihood history for next iqr calculation
    # change 1:j to j
    .$dataf$pars_lklihood[outliers, j] <- .$dataf$pars_lklihood[replace_idx, j]

    # somehow throw out all previous likelihood storage

    # somehow throw out all corresponding stored information in .$dataf$obs, .$dataf$out_mcmc, .$dataf$prop_storage, and .$dataf$conv_check
  }
walkeranthonyp commented 3 years ago

Thanks, all handled now.