vlas-sokolov / pyspecnest

Nested sampling of pyspeckit models with PyMultiNest.
MIT License
2 stars 3 forks source link

Calculate uncertainties #10

Open jpinedaf opened 4 years ago

jpinedaf commented 4 years ago

@vlas-sokolov I would like to calculate the uncertainties. The code to calculate the properties is here. Would it be sufficient to load the uncertainties in this https://github.com/vlas-sokolov/pyspecnest/blob/master/pyspecnest/chaincrunch.py#L421 line following the example from here https://github.com/JohannesBuchner/PyMultiNest/blob/master/pymultinest/analyse.py#L212?

    stat_result = []

    stats_file = open(a.stats_file)
    for l in stats_file.readlines():
        if trigger[stat] in l:
            sweet_spot = True
            continue
        if sweet_spot:
            if l.startswith('Dim No.'):
                continue
            line_arr = np.fromstring(l.strip(), sep=' ')
            if line_arr.size == 0:
                sweet_spot = False
            else:
                stat_result.append(line_arr)

    return np.squeeze(np.array(stat_result)[:, 1:].T),  np.squeeze(np.array(stat_result)[:, 2:].T)

and then modify this function as

def pars_xy(x, y, mode='fast', stat='mle', **kwargs):
    a = analyzer_xy(x=x, y=y, **kwargs)
    try:
        if mode == 'slow':
            pars = a.get_best_fit()['parameters']
            errs = pars * np.nan # what should be the proper here?
        elif mode == "fast":
            pars, errs = get_stats_fast(a, stat=stat)
        else:
            raise ValueError("mode should be one of the ['slow', 'fast']")
    except IOError:
        return np.nan
    return pars, errs

and then here https://github.com/vlas-sokolov/pyspecnest/blob/master/pyspecnest/chaincrunch.py#L450

        pars, errs = pars_xy(x=x, y=y, npars=npars, npeaks=npeaks, **kwargs)
        parcube[:, y, x] = pars
        print(parcube[:, y, x])
        errcube[:, y, x] = errs

If this makes sense then I can implement :-)... but I am not certain of what is in the file, but I am following the example from the PyMultiNest code.

vlas-sokolov commented 4 years ago

@jpinedaf thanks for taking a look into this! Just wanted to chine in about some tricky details we have to be cautious about here, because what you're trying to get is a one-number approximation of uncertainties (already given by the computed posterior).

PyMultiNest does interpolation of the posteriors and calculates the 1-sigma values that you take in the PR over the whole prior range. Now if you would then compare these to uncertainties from your run-of-the-mill pyspeckit uncertainties, I expect to see major differences, with pyspeckit error being smaller in cases I could think of below:

  1. Wide priors - everything related to line centroid would probably get sigma values way over the tightly constrained ones in pyspeckit
  2. Bimodalities and correlated parameters - both cases have extra features in the posteriors that will overestimate (in a pyspeckit sense) the uncertainties
vlas-sokolov commented 4 years ago

My preferred way of estimating uncertainties for spectral fitting was to get the MLE and then quote the asymmetrical confidence intervals around it (check Figure 6 in https://arxiv.org/abs/1009.2755). So then what your PR would achieve would be something like Fig. 6.1, while I would argue that 6.2 is more informative (and you won't end up putting errorbars outside the prior window that way).

But I guess anything is better than not having anything to show for uncertainties at all :)

jpinedaf commented 4 years ago

Thanks for the comments!

Indeed, 1-value for sigma is wrong and there are better ways (as the asymmetrical errors). My main concern is that since we are generating maps of every parameter, then I don't know how to show that properly. If we have a single spectrum analysis, then I would wholeheartedly agree with the asymmetrical uncertainty (as 6.2).

I think that the issues regarding to possible differences on how to calculate the uncertainty and pyspeckit should be discussed in the paper... what do you think?

vlas-sokolov commented 4 years ago

@jpinedaf absolutely, again, it makes total sense to add at least the simplistic approach because the current alternative we offer is either of:

  1. Inspect the posterior of any given spectrum yourself
  2. Inspect the probability cube generation, which is highly-experimental and I myself sometimes struggled to interpolate the probability cubes correctly.

Also, describing the complexity of uncertainties in the paper is a good preemptive way to pacify potential questions about it.

vlas-sokolov commented 4 years ago

I am going to keep this open for any possible discussion that might arise in the future.