joshspeagle / dynesty

Dynamic Nested Sampling package for computing Bayesian posteriors and evidences
https://dynesty.readthedocs.io/
MIT License
357 stars 77 forks source link

Posterior Predictive Check #104

Closed viraj96 closed 6 years ago

viraj96 commented 6 years ago

Once the posterior parameters have been estimated, is there a way to sample from this posterior distribution in order to validate the prediction and the actual data much like the posterior_predictive_check function of pyMC3? To elaborate I have a sampler that takes in a loglikelihood function and a prior_transform function to estimate the parameters. Now I can access the posterior samples for each parameter using some weighted technique. Now I use gaussian KDE to generate distributions from the samples. Given that I have all this information, how do I use it to generate a prediction? Do I have to define a new sampler with the same likelihood and prior transform function?

Also while estimating those parameters the sampler shows a Runtime warning, RuntimeWarning: invalid value encountered in sqrt ('logzerr', np.sqrt(np.array(self.saved_logzvar))), This generally happens when the logz value becomes very small making it nan. Is there are a workaround for this? I tried reducing the tolerance value from 0.1 to 0.005 removing the previous warning but now it gives me another warning at each iteration stating Volume is sparsely sampled. MLE covariance. In both the cases, the logzerr is a nan.

joshspeagle commented 6 years ago

Any posterior predictive check requires a data-generating process that turns those values into observations. This is not implemented natively in dynesty but can be easily done afterwards by, e.g., drawing samples proportional to their weights and generating observations for them. I don’t see why you need to use KDE.

As for the logzerr being nan, I haven’t added the issue to the docs but this is nothing to be concerned about: the approximation used during runtime sometimes gives predictions that are negative, but errors can be simulated using the simulate_run function in the provided utilities. See the demos in the github repo on errors for an example.

The final issue is a bug in the version up on pip; if you pull to the most recent github version it should go away. I want to upload the changes to pip soon, but I won’t be able to get things squared away for a bit due to some other projects.

viraj96 commented 6 years ago

I will update to the most recent version of this library from github. Thanks for pointing that out.

The reason for using the KDE is because I am doing an incremental update of the priors. Here is my likelihood function:

def loglikelihood(theta):
    global dwell_time, on_data, off_data
    beta_0, beta_1_ON, beta_1_OFF, beta_2_ON, beta_2_OFF = theta
    alpha = beta_1_ON*on_data + beta_1_OFF*off_data + beta_0
    beta = beta_2_ON*on_data + beta_2_OFF*off_data
    mu = math.log(alpha)
    s = 1.0 / beta
    exponent = math.exp((-1.0 * (dwell_time - mu)) / s)
    model = (exponent * 1.0) / (s * ((1 + exponent)**2))
    return model

The initial prior_transform function that I pass to my sampler is this:

def prior_transform(theta):
    beta_0, beta_1_ON, beta_1_OFF, beta_2_ON, beta_2_OFF = theta
    beta_0 = 1 + 0.5*ndtri(beta_0)
    beta_1_ON = 3 + ndtri(beta_1_ON)
    beta_1_OFF = 3 + ndtri(beta_1_OFF)
    beta_2_ON = 1 + 0.5*ndtri(beta_2_ON)
    beta_2_OFF = 1 + 0.5*ndtri(beta_2_OFF)
    return (beta_0, beta_1_ON, beta_1_OFF, beta_2_ON, beta_2_OFF)

Now I traverse my data and set the global variables required in the likelihood function. I run the nested sampling and the obtain the posterior_samples using resample_equal. I define another prior_transform function based on these posterior_samples as follows:

def prior_transform_update(theta):
    global posterior_samples
    beta_0_samples = posterior_samples[:, 0]
    beta_0_samples = beta_0_samples[beta_0_samples > 0]
    beta_0_min, beta_0_max = np.min(beta_0_samples), np.max(beta_0_samples)
    beta_0_data_X = np.linspace(beta_0_min, beta_0_max, 100)
    beta_0_data_Y = gaussian_kde(beta_0_samples)(beta_0_data_X)
    y0 = (np.min(beta_0_data_Y)*1.0) / 100
#     beta_0 = np.interp(np.median(beta_0_samples), beta_0_data_X, beta_0_data_Y, left=y0, right=y0)
    beta_0 = gaussian_kde(beta_0_samples).evaluate(np.median(beta_0_samples))[0]

    beta_1_ON_samples = posterior_samples[:, 1]
    beta_1_ON_samples = beta_1_ON_samples[beta_1_ON_samples > 0]
    beta_1_ON_min, beta_1_ON_max = np.min(beta_1_ON_samples), np.max(beta_1_ON_samples)
    beta_1_ON_data_X = np.linspace(beta_1_ON_min, beta_1_ON_max, 100)
    beta_1_ON_data_Y = gaussian_kde(beta_1_ON_samples)(beta_1_ON_data_X)
    y0 = (np.min(beta_1_ON_data_Y)*1.0) / 100
#     beta_1_ON = np.interp(np.median(beta_1_ON_samples), beta_1_ON_data_X, beta_1_ON_data_Y, left=y0, right=y0)\
    beta_1_ON = gaussian_kde(beta_1_ON_samples).evaluate(np.median(beta_1_ON_samples))[0]

    beta_1_OFF_samples = posterior_samples[:, 2]
    beta_1_OFF_samples = beta_1_OFF_samples[beta_1_OFF_samples > 0]
    beta_1_OFF_min, beta_1_OFF_max = np.min(beta_1_OFF_samples), np.max(beta_1_OFF_samples)
    beta_1_OFF_data_X = np.linspace(beta_1_OFF_min, beta_1_OFF_max, 100)
    beta_1_OFF_data_Y = gaussian_kde(beta_1_OFF_samples)(beta_1_OFF_data_X)
    y0 = (np.min(beta_1_OFF_data_Y)*1.0) / 100
#     beta_1_OFF = np.interp(np.median(beta_1_OFF_samples), beta_1_OFF_data_X, beta_1_OFF_data_Y, left=y0, right=y0)
    beta_1_OFF = gaussian_kde(beta_1_OFF_samples).evaluate(np.median(beta_1_OFF_samples))[0]

    beta_2_ON_samples = posterior_samples[:, 3]
    beta_2_ON_samples = beta_2_ON_samples[beta_2_ON_samples > 0]
    beta_2_ON_min, beta_2_ON_max = np.min(beta_2_ON_samples), np.max(beta_2_ON_samples)
    beta_2_ON_data_X = np.linspace(beta_2_ON_min, beta_2_ON_max, 100)
    beta_2_ON_data_Y = gaussian_kde(beta_2_ON_samples)(beta_2_ON_data_X)
    y0 = (np.min(beta_2_ON_data_Y)*1.0) / 100
#     beta_2_ON = np.interp(np.median(beta_2_ON_samples), beta_2_ON_data_X, beta_2_ON_data_Y, left=y0, right=y0)
    beta_2_ON = gaussian_kde(beta_2_ON_samples).evaluate(np.median(beta_2_ON_samples))[0]

    beta_2_OFF_samples = posterior_samples[:, 4]
    beta_2_OFF_samples = beta_2_OFF_samples[beta_2_OFF_samples > 0]
    beta_2_OFF_min, beta_2_OFF_max = np.min(beta_2_OFF_samples), np.max(beta_2_OFF_samples)
    beta_2_OFF_data_X = np.linspace(beta_2_OFF_min, beta_2_OFF_max, 100)
    beta_2_OFF_data_Y = gaussian_kde(beta_2_OFF_samples)(beta_2_OFF_data_X)
    y0 = (np.min(beta_2_OFF_data_Y)*1.0) / 100
#     beta_2_OFF = np.interp(np.median(beta_2_OFF_samples), beta_2_OFF_data_X, beta_2_OFF_data_Y, left=y0, right=y0)
    beta_2_OFF = gaussian_kde(beta_2_OFF_samples).evaluate(np.median(beta_2_OFF_samples))[0]

    return (beta_0, beta_1_ON, beta_1_OFF, beta_2_ON, beta_2_OFF)

I define another sampler based on this new prior_transform function. This runs the first time around but the next time it is unable to use gaussian_kde as it complains about singular matrix. I chose to do it this way as I saw the from_posterior defined for the pyMC3 library which was defined as follows:

def from_posterior(param, samples):

    class FromPosterior(Continuous):

        def __init__(self, *args, **kwargs):
            self.logp = logp
            super(FromPosterior, self).__init__(*args, **kwargs)

    smin, smax = np.min(samples), np.max(samples)
    x = np.linspace(smin, smax, 100)
    y = stats.gaussian_kde(samples)(x)
    y0 = np.min(y) / 10 # what was never sampled should have a small probability but not 0

    @as_op(itypes=[tt.dscalar], otypes=[tt.dscalar])
    def logp(value):
        # Interpolates from observed values
        return np.array(np.log(np.interp(value, x, y, left=y0, right=y0)))

    return FromPosterior(param, testval=np.median(samples))

Do you believe there is some other neat way to do this incremental update of priors? Also note that the second sampler(the one with the updated prior_transform definition) runs very slowly and takes about 2 mins to complete for nlive = 1024 and dlogz tolerance = 0.1 with multiNest rwalk setting.

joshspeagle commented 6 years ago

I will update to the most recent version of this library from github. Thanks for pointing that out.

Sure. Let me know if you run into any additional bugs since the most recent version hasn't been tested as extensively as some previous versions.

The reason for using the KDE is because I am doing an incremental update of the priors.

As I mentioned on #101, this problem is very ill-matched for nested sampling. If you're set on using Nested Sampling over a better-suited method like Sequential Monte Carlo though, updating the distribution to a new likelihood based on importance reweighting is a far better option that trying anything to do with KDE if your likelihoods aren't changing too much.

note that the second sampler(the one with the updated prior_transform definition) runs very slowly

As far as I can tell, the reason your version is so slow is because KDE evaluation is generally slow. From what I can see above, I also notice that prior_transform_update (unnecessarily) recomputes the KDE every time it is called, and furthermore doesn't actually return values that depend on theta at all. Is that the intended behavior?

viraj96 commented 6 years ago

I also notice that prior_transform_update (unnecessarily) recomputes the KDE every time it is called

The first call to the gaussian_kde was intended for the commented version which was also giving singular matrix error. So I tried using a different method but did not change the result. Even if I comment the redundant call to gaussian_kde, my version still runs pretty slow.

furthermore doesn't actually return values that depend on theta at all. Is that the intended behavior?

I did this as I believe that the parameters don't depend on theta but on the posterior_samples that were generated when the sampler ran the last time in an incremental prior updation setting.

updating the distribution to a new likelihood based on importance reweighting is a far better option than trying anything to do with KDE if your likelihoods aren't changing too much.

From my current understanding do you mean that I generate the posterior_samples after reweighting the results using reweight_run something like this:

logl = np.array([loglikelihood(theta) for theta in results.samples])
results_reweighted = dynesty.utils.reweight_run(results, logp_new = logl)
weights = np.exp(results_reweighted['logwt'] - results_reweighted['logz'][-1])
posterior_samples = dynesty.utils.resample_equal(results_reweighted.samples, weights)

If yes then why would there be no need to use the KDE as the samples are still discreet and I need to estimate the pdf of this parameter? Once that is done then I can evaluate the value of the pdf at any point which I am currently taking as the median of the samples. Also the example that you linked exchanges the loglikelihood for each sampler but I have only one loglikelihood function. So would it make sense to reweight the samples based on the same likelihood which was used to generate them?

Update: I tried another version of prior_transform without using KDE as follows:

def prior_transform_test(theta):
    beta_0, beta_1_ON, beta_1_OFF, beta_2_ON, beta_2_OFF = theta

    beta_0_samples = posterior_samples[:, 0]
    beta_0_samples = beta_0_samples[beta_0_samples > 0]
    beta_0 = np.mean(beta_0_samples) + np.std(beta_0_samples)*ndtri(beta_0)

    beta_1_ON_samples = posterior_samples[:, 1]
    beta_1_ON_samples = beta_1_ON_samples[beta_1_ON_samples > 0]
    beta_1_ON = np.mean(beta_1_ON_samples) + np.std(beta_1_ON_samples)*ndtri(beta_1_ON)

    beta_1_OFF_samples = posterior_samples[:, 2]
    beta_1_OFF_samples = beta_1_OFF_samples[beta_1_OFF_samples > 0]
    beta_1_OFF = np.mean(beta_1_OFF_samples) + np.std(beta_1_OFF_samples)*ndtri(beta_1_OFF)

    beta_2_ON_samples = posterior_samples[:, 3]
    beta_2_ON_samples = beta_2_ON_samples[beta_2_ON_samples > 0]
    beta_2_ON = np.mean(beta_2_ON_samples) + np.std(beta_2_ON_samples)*ndtri(beta_2_ON)

    beta_2_OFF_samples = posterior_samples[:, 4]
    beta_2_OFF_samples = beta_2_OFF_samples[beta_2_OFF_samples > 0]
    beta_2_OFF = np.mean(beta_2_OFF_samples) + np.std(beta_2_OFF_samples)*ndtri(beta_2_OFF)

    return (beta_0, beta_1_ON, beta_1_OFF, beta_2_ON, beta_2_OFF)

This seems to work well for the few iterations that I ran it.

drawing samples proportional to their weights and generating observations for them

Can you elaborate more on this? How do I go about this. Is there a small demo that I can have a look at?

joshspeagle commented 6 years ago

If yes then why would there be no need to use the KDE as the samples are still discreet and I need to estimate the pdf of this parameter?

Any sampling-based approach approximates the posterior as a discrete set of samples from the underlying PDF (MCMC, NS, etc.). Any smooth behavior you see in, e.g., traceplots, cornerplots, etc. is just applying KDE for ease of visualization. There is never any fundamental need to do KDE.

Also the example that you linked exchanges the loglikelihood for each sampler but I have only one loglikelihood function. So would it make sense to reweight the samples based on the same likelihood which was used to generate them?

Ok, so now I'm confused/worried. Why are you adjusting the priors at all? dynesty (and any other sampling method) estimates the posterior given you current set of priors; changing those based on the current run (i.e. feeding your posterior back in as a prior on the exact same data) without any principled argument (e.g., changing time-series data, swapping likelihoods, etc.) is just the Bayesian version of over-fitting.

drawing samples proportional to their weights and generating observations for them Can you elaborate more on this?

This thread is a reasonable place to start.

I'm going to close this issue for now since posterior predictive checks should be done by the user and not required by dynesty for black-box likelihoods, but feel free to continue posting if you have further updates/questions.