MCMC Runtime - Githubissues

rzellem commented 4 years ago

Currently the MCMC is hard-coded to run for 100,000 iterations and does no "real-time" convergence tests. Could explore stopping the MCMC after it has achieved convergence by the Gelman-Rubin statistic or the other built-in samplers.

rzellem commented 4 years ago

https://github.com/pymc-devs/pymc3/issues/3546

pearsonkyle commented 4 years ago

it's possible to add a list inside the minimization function to record parameters and chi2 values. We will build our own trace and perform our own burn-in after based on the chi^2 values instead of based on the number of samples.

    chi2 = []
    allpars = []

    @tco.as_op(itypes=[tt.dscalar, tt.dscalar, tt.dscalar, tt.dscalar, tt.dscalar],otypes=[tt.dvector])
    def transit2min(*pars):
        rprs, tmid, ar, a1, a2 = pars
        tpars['rp'] = float(rprs)
        tpars['tm'] = float(tmid)
        tpars['ar'] = float(ar)
        stime = time - min(time)
        ramp = 1-10**a1*np.exp(a2*stime)
        lcmode = transit(time=time, values=tpars)
        detrended = flux/(lcmode)
        wf = weightedflux(detrended, gw, nearest)
        model = lcmode*ramp*wf
        res = flux - model
        # for filtering later
        chi2.append(np.sum(res**2))
        allpars.append([float(rprs), float(tmid), float(ar), float(a1), float(a2)])
        return model

I'm working on some optimizations for Excalibur, I can transfer some solutions to Exotic after

rzellem commented 4 years ago

👍

Sent from my iPhone

On Jun 19, 2020, at 8:52 AM, Kyle A. Pearson notifications@github.com wrote:

it's possible to add a list inside the minimization function to record parameters and chi2 values

chi2 = []
allpars = []

@tco.as_op(itypes=[tt.dscalar, tt.dscalar, tt.dscalar, tt.dscalar, tt.dscalar],otypes=[tt.dvector])
def transit2min(*pars):
    rprs, tmid, ar, a1, a2 = pars
    tpars['rp'] = float(rprs)
    tpars['tm'] = float(tmid)
    tpars['ar'] = float(ar)
    stime = time - min(time)
    ramp = 1-10**a1*np.exp(a2*stime)
    lcmode = transit(time=time, values=tpars)
    detrended = flux/(lcmode)
    wf = weightedflux(detrended, gw, nearest)
    model = lcmode*ramp*wf
    res = flux - model
    # for filtering later
    chi2.append(np.sum(res**2))
    allpars.append([float(rprs), float(tmid), float(ar), float(a1), float(a2)])
    return model

I'm working on some optimizations for Excalibur, I can transfer some solutions to Exotic after

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/rzellem/EXOTIC/issues/18#issuecomment-646708800, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADDK7N347AXMFMT7QJZUMI3RXOCUHANCNFSM4N4TYH5A.

rzellem commented 4 years ago

@pearsonkyle please work with @mjs2369 - they want to learn more about MCMCs and work on this task. Maybe they could implement your code for you?

pearsonkyle commented 4 years ago

I prototyped a new solution last night using nested sampling in a pure python implementation. It runs in roughly 14 minutes for a 2000 data point light curve, albeit it was on my fancy mac book. I'm still playing with the sampler parameters, in particular the stopping criteria and sampling density. It's possible to get that time down even more. A sample trace from their code looks like this:

It is nice because it will return your chi2 values. The red line above is the truth.

Some work needs to be done on the plots but all the necessary information is there

Example code: https://gist.github.com/pearsonkyle/b3c7dfdfad7cc985e964767f84c03f90

There's still lots to fix up before it's added into exotic. I'm also working on this for Excalibur. It will be quite modular where you'll have to define your own likelihood function, bounds, and initial parameters. The rest will be taken care of in a class I'm writing. I could use @mjs2369 help with converting some of the code in exotic and in creating a custom likelihood function which includes airmass and potentially more light curve parameters than just Rp/Rs and Tmid. The code will have a rather simple set up like such:

    init = { 
        'rprs':0.06,        # Rp/Rs
        'ars':14.07,        # a/Rs
        'per':3.336817,     # Period [day]
        'inc':88.75,        # Inclination [deg]
        'u1': 0.3, 'u2': 0.01, # limb darkening (linear, quadratic)
        'ecc':0,            # Eccentricity
        'omega':0,          # Arg of periastron
        'tmid':0.75         # time of mid transit [day]
    } 

    # only chose bounds for the free parameters
    mybounds = {
        'rprs':[0,1],
        'tmid':[min(t),max(t)],
        #'a0':[-np.inf,np.inf],
        #'a1':[-np.inf,np.inf]
    }
    freekeys = list(mybounds.keys())

    # log-likelihood, can handle an arbitrary amount of free parameters 
    def loglike(pars):
        for i in range(len(pars)):
            init[freekeys[i]] = pars[i]
        model = transit(t, init)
        return -0.5 * np.sum( ((data-model)/dataerr)**2 )

    # this is a class which contains all of the fitting algorithms
    myfit = lc_fitter(
        t,data,
        dataerr=dataerr,
        init= init,
        bounds= mybounds,
    )

The docs for the sampler are quite thorough: https://dynesty.readthedocs.io/en/latest/

There is also a paper with a more technical description of the algorithms and math behind Bayesian inference: https://github.com/joshspeagle/dynesty/blob/master/paper/dynesty.pdf

A nice starting example: https://github.com/joshspeagle/dynesty/blob/master/demos/Demo%202%20-%20Dynamic%20Nested%20Sampling.ipynb

pearsonkyle commented 4 years ago

@rzellem Results from testing nested sampling. It requires one python library, no pymc3, no theano, windows compatible.

Noisy data is generated at a low SNR and 3 transit parameters are optimized using nested sampling. A 200 data point lightcurve takes ~3 minutes to process and generates roughly 30k samples before reaching its convergence criteria (https://dynesty.readthedocs.io/en/latest/overview.html#). The samples are weighted by their chi-squared to find the best fit solution and to estimate the uncertainty on each parameter Data points are color-coded to the likelihood value. The contours define the 1 and 2 sigma level. Dotted lines represent 2 sigma uncertainties which encompass ~95% of the possible solutions. When the data is particularly noisy and the transit parameters can not be well constrained the posteriors will diverge from being Gaussian.

A higher SNR measurements usually reveals just a single mode in the posterior: This dataset was 1000 data points and took ~8 minutes

rzellem commented 4 years ago

Very nice, Kyle! Can you run the sample HAT-P-32 b data? Would be nice to compare head-to-head. Also, does the nested sampler automatically stop when it converges? Or is the runtime hard coded like we have it for pymc3?

Sent from my iPhone

On Jun 26, 2020, at 12:01 AM, Kyle A. Pearson notifications@github.com wrote:

@rzellemhttps://github.com/rzellem Results from testing nested sampling. It requires one python library, no pymc3, no theano, windows compatible.

Noisy data is generated at a low SNR and 3 transit parameters are optimized using nested sampling. A 200 data point lightcurve takes ~3 minutes to process and generates roughly 30k samples before reaching its convergence criteria (https://dynesty.readthedocs.io/en/latest/overview.html#). The samples are weighted by their chi-squared to find the best fit solution and to estimate the uncertainty on each parameter [https://camo.githubusercontent.com/c155ca9315d32b956fbfdc6df6505173b86d68b4/68747470733a2f2f692e696d6775722e636f6d2f643269347869742e706e67]https://camo.githubusercontent.com/c155ca9315d32b956fbfdc6df6505173b86d68b4/68747470733a2f2f692e696d6775722e636f6d2f643269347869742e706e67 Data points are color-coded to the likelihood value. The contours define the 1 and 2 sigma level. Dotted lines represent 2 sigma uncertainties which encompass ~95% of the possible solutions. When the data is particularly noisy and the transit parameters can not be well constrained the posteriors will diverge from being Gaussian.

A higher SNR measurements usually reveals just a single mode in the posterior: [https://camo.githubusercontent.com/0ab4b477d3ee656624d9e9d47836f11f17e8263b/68747470733a2f2f692e696d6775722e636f6d2f49794d5276725a2e706e67]https://camo.githubusercontent.com/0ab4b477d3ee656624d9e9d47836f11f17e8263b/68747470733a2f2f692e696d6775722e636f6d2f49794d5276725a2e706e67 This sample was 1000 data points and took ~8 minutes

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/rzellem/EXOTIC/issues/18#issuecomment-650007672, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ADDK7N74ATEVQDUABZOOPDDRYRAF5ANCNFSM4N4TYH5A.

pearsonkyle commented 4 years ago

The sampler automatically stops when it converges, usually indicated by when the Bayesian evidence is below a particular tolerance and there's no more information that can be extracted from the parameter space. From all my tests it usually builds posteriors from ~30-50k samples. I'll test it on the HAT-P-32 data after I add in the airmass function

tamimfatahi commented 4 years ago

Changed to nested sampler in pull request #170

rzellem / EXOTIC

MCMC Runtime #18