spacetelescope / specviz

An interactive astronomical 1D spectra analysis tool.
http://specviz.readthedocs.io
BSD 3-Clause "New" or "Revised" License
43 stars 31 forks source link

Re-implement statistic functions in specutils. #425

Closed nmearl closed 6 years ago

nmearl commented 6 years ago
hcferguson commented 6 years ago

Comments from the old Trello Card https://trello.com/c/cPWIpQ2D/7-measure-equivalent-widths-centroids-fluxes

Quantities to be measured

Use Cases

Algorithms

The Splot manual has a long description:

FLUX AND EQUIVALENT WIDTH DETERMINATIONS There are currently five techniques in SPLOT to measure equivalent widths of lines. The simplest (conceptually) is by integration of the pixel intensities between two marked pixels. This is invoked with the 'e' keystroke. The user marks the two edges of the line at the continuum. The line center is a weighted centroid and the continuum is a linear function between the two points marked.

The most complex method is performed by the Gaussian fitting and deblending commands which compute a non-linear least-squares Gaussian fit to the line(s). These are invoked with the 'd' or 'k' keystroke. These were described in detail previously.

The fourth and fifth methods, selected with the 'h' key, determine the equivalent width from a Gaussian profile defined by a constant continuum level "cont", a core depth "core", and the width of the line "dw" at some intermediate level "Iw".

    I(w) = cont + core * exp (-0.5*((w-center)/sigma)**2)
    sigma = dw / 2 / sqrt (2 * ln (core/Iw))
    fwhm = 2.355 * sigma
    flux = core * sigma * sqrt (2*pi)
    eq. width = abs (flux) / cont

where w is wavelength.

For ease of use with a large number of lines only one cursor position is used to mark the center of the line and one flux level. Note that both the x any y cursor positions are read simultaneously. From the x cursor position the line center and core intensity are determined. The region around the specified line position is searched for a minimum or maximum and a parabola is fit to better define the extremum.

The two methods based on the simple Gaussian profile model differ in how they use the y cursor position and what part of the line is used. After typing 'h' one selects the method and whether to use the left, right, or both sides of the line by a second keystroke. The 'l', 'r', and 'k' keys require a continuum level of one. The y cursor position defines where the width of the line is determined. The 'a', 'b', and 'c' keys use the y cursor position to define the continuum and the line width is determined at the point half way between the line core and the continuum. In both cases the width at the appropriate level is determined by the interception of the y level with the data using linear interpolation between pixels. The one-sided measurements use the half-width on the appropriate side and the two-sided measurments use the full-width.

The adopted Gaussian line profile is drawn over the spectrum and the horizontal and vertical lines show the measured line width and the depth of the line center from the continuum. This model may also be subtracted from the spectrum using the '-' key.

The major advantages of these methods are that only a single cursor setting (both the x and y positions are used) is required and they are fast. The 'l', 'r', and 'k' keys give more flexibility in adjusting the width of the Gaussian line at the expense or requiring that the spectrum be normalized to a unit continuum. The 'a', 'b', and 'c' keys allow measurements at any continuum level at the expense of only using the half flux level to determine the Gaussian line width.

All these methods print and record in the log file the line center, continuum intensity at the line center, the flux, and the equivalent width. For the 'e' key the flux is directly integrated while for the other methods the fitted Gaussian is integrated. In addition, for the Gaussian profile methods the core intensity above or below the continuum, the sigma, and the FWHM (=2.355 sigma) are also printed. A brief line of data for each measurement is printed on the graphics status line. To get the full output and the output from previous measurments use the command ":show". This pages the output on the text output which may involve erasing the graphics.

The integrated fluxes for all the methods are in the same units as the intensities and the integration is done in the same units as the plotted scale. It is the user's responsiblity to keep track of the flux units. As a caution, if the data is in flux per unit frequency, say ergs/cm2/sec/hz, and the dispersion in Angstroms then the integrated flux will not be in the usual units but will be A-ergs/cm2/sec/hz. For flux in wavelength units, ergs/cm2/sec/A the integrated flux will be correct; i.e. ergs/cm2/sec.

ERROR ESTIMATES The deblending ('d'), Gaussian fitting ('k'), and profile integration and equivalent width ('e') functions provide error estimates for the measured parameters. This requires a model for the pixel sigmas. Currently this model is based on a Poisson statistics model of the data. The model parameters are a constant Gaussian sigma and an "inverse gain" as specified by the parameters SIGMA0 and INVGAIN. These parameters are used to compute the pixel value sigma from the following formula:

   sigma**2 = sigma0**2 + invgain * I

where I is the pixel value and "**2" means the square of the quantity.

If either the constant sigma or the inverse gain are specified as INDEF or with values less than zero then no noise model is applied and no error estimates are computed. Note that for processed spectra this noise model will not generally be the same as the detector readout noise and gain. These parameters would need to be estimated in some way using the statistics of the spectrum. The use of an inverse gain rather than a direct gain was choosed to allow a value of zero for this parameters. This provides a model with constant uncertainties.

The direct profile integration error estimates are computed by error propagation assuming independent pixel sigmas. Also it is assumed that the marked linear background has no errors. The error estimates are one sigma estimates. They are given in the log output (which may also be view without exiting the program using the :show command) below the value to which they apply and in parenthesis.

The deblending and Gaussian fit error estimates are computed by Monte-Carlo simulation. The model is fit to the data (using the sigmas) and this model is used to describe the noise-free spectrum. One hundred simulations are created in which random Gaussian noise is added to the noise-free spectrum based on the pixel sigmas from the noise model. The model fitting is done for each simulation and the absolute deviation of each fitted parameter to model parameter is recorded. The error estimate for the each parameter is then the absolute deviation containing 68.3% of the parameter estimates. This corresponds to one sigma if the distribution of parameter estimates is Gaussian though this method does not assume this.

The Monte-Carlo technique automatically includes all effects of parameter correlations and does not depend on any approximations. However the computation of the errors does take a significant amount of time.

eteq commented 6 years ago

Agreed that a lot of this is very important, particularly the two nick mentioned at the start need to be in specutils. but they should follow a flow that's reasonably consistent with how the line-fitting machinery works. See https://github.com/spacetelescope/dat_pyinthesky/blob/master/pyinthesky_specutils_fitting.ipynb for a bit of the context

eteq commented 6 years ago

Also, re the items from the trello card, here are my thoughts of critical-ness. Part of the point of this is that we don't want to overwhelm users with lots of things they won't use.

  • Flux: the integral of the pixel intensities within a given region

Critical

  • Center: an estimate of the wavelength position of the center of an emission or absorption feature. See splot description of algorithms.

Critical. however, this should not be just a one-off function, it might be better as a class heirarchy (perhaps with a helper function), because there are many cases where one wants to use centering in the middle of some other function. See the centering machinery in photutils to see how this might work.

  • Centroid: An estimate of the center using the first moment of the pixel intensities within the selected wavelength region

I think this is the same as "center", just a specific kind of "Center"?

  • Equivalent width: W = Sum( (Fc-Fl)/Fc * dw, where Fc is the continuum flux per pixel, Fl is the line flux per pixel, and dw is the wavelength interval per pixel.

Critical

  • FWHM: Full width at half maximum

Critical

  • Gaussian width (sigma): 1-sigma full width of an equivalent Gaussian (Can be determined from moments or fitting; see splot description of algorithms).

Useful

  • FWZI: Full width at zero intensity

Marginally useful

  • Mean: The mean intensity in the selected region, with optional sigma clipping

Useful, but maybe shouldn't get a special function. Instead should tell user "cut out the ROI and do np.mean")

  • Standard Deviation: The standard deviation of the pixel intensities in the region, with optional sigma clipping

Same as "mean"

stscicrawford commented 6 years ago

@eteq I just had a request to measure FWZI for some stellar spectra for a project, so what would be marginally useful to you is extremely useful to someone else.

stscicrawford commented 6 years ago

For mean and standard deviation, they may need sigma clipping should just use the astropy.stats.sigma_clip_stats functionality.

brechmos-stsci commented 6 years ago

I talked with @stscicrawford about these statistics/measurements and he was good with this list (and with the above noted comment from him).

Also, in discussion with @stscicrawford it was astropy.stats he mentioned to me several weeks ago (not measurements within nddata, my mistake), but given the list above is more spectra related these will not likely be directly available from astropy.stats (though we might be able to use some parts of astropy.stats).

There is also a note that we might want the splot implementation and another implementation if there are more common implementations.

@stscicrawford @eteq Many of these methods above could be applied to a region. Generally, for these measurement methods should the interface be measure(spectrum) assuming spectrum is the sub-region or would you prefer measure(spectrum, region=None) where one can specify the region and then the specutils measure method does the extraction and then measurement. I already have an extraction method as part of the fitting. Either is fine with me, your preference.

If we are good with this, then I'll create a ticket for each of these in specutils (and match to the project board if it is in there).

eteq commented 6 years ago

@eteq I just had a request to measure FWZI for some stellar spectra for a project, so what would be marginally useful to you is extremely useful to someone else.

Absolutely - I didn't mean my comment above to be the "right answer", just a first shot at prioritizing so we can pick an order to work on them.

re: @brechmos-stsci :

@stscicrawford @eteq Many of these methods above could be applied to a region. Generally, for these measurement methods should the interface be measure(spectrum) assuming spectrum is the sub-region or would you prefer measure(spectrum, region=None) where one can specify the region and then the specutils measure method does the extraction and then measurement. I already have an extraction method as part of the fitting. Either is fine with me, your preference.

Hmm... I'm not convinced that's the way to go - in the line-fitting stuff I was advocating for you cut out the spectrum and pass that. E.g. measure(region_cut(spectrum, ...)). I can see pros and cons of both approaches, though. My strongest opinion is that they both be the same (i.e., not fit_lines(region_cut(spectrum))andmeasure(spectrum, region=...)``).

Note my points above though about how I think some of these items might be best implemented as just a cut-out and then call existing numpy functions. Exactly which are in the category are probably best discussed on individual cases when we create separate specutils issues. That could work in either case though (e.g. np.std(region_cut(spectrum, ...)) or np.std(region.cut(spectrum)), as presumably the thing that gets passed in for the measure(spectrum, region=...) case could be a full-fledged object that can cut a spectrum if need be.

brechmos-stsci commented 6 years ago

I created specutil tickets for each of the items above. So, for specific implementation details let's discuss on those tickets:

S/N: signal to noise within a region of the spectrum https://github.com/astropy/specutils/issues/239

Flux: the integral of the pixel intensities within a given region https://github.com/astropy/specutils/issues/235

Center: an estimate of the wavelength position of the center of an emission or absorption feature. See splot description of algorithms. https://github.com/astropy/specutils/issues/240

Centroid: An estimate of the center using the first moment of the pixel intensities within the selected wavelength region https://github.com/astropy/specutils/issues/236

Equivalent width: W = Sum( (Fc-Fl)/Fc * dw, where Fc is the continuum flux per pixel, Fl is the line flux per pixel, and dw is the wavelength interval per pixel. https://github.com/astropy/specutils/issues/237

FWHM: Full width at half maximum https://github.com/astropy/specutils/issues/238

Gaussian width (sigma): 1-sigma full width of an equivalent Gaussian (Can be determined from moments or fitting; see splot description of algorithms). https://github.com/astropy/specutils/issues/241

FWZI: Full width at zero intensity https://github.com/astropy/specutils/issues/242

Mean: The mean intensity in the selected region, with optional sigma clipping https://github.com/astropy/specutils/issues/243

Standard Deviation: The standard deviation of the pixel intensities in the region, with optional sigma clipping (to use astropy sigma clipping) https://github.com/astropy/specutils/issues/244

I'll begin working on / looking at those tickets next.

eteq commented 6 years ago

Thanks for setting those up @brechmos-stsci - will move specific discussion there, but note that I created astropy/specutils#234 for discussion of things that apply to all of these. I suggest we move the ROI/region discussion there?

This issue, then, can be left as a place to discuss the specviz-specific bits that need updating after these parts are in specutils

javerbukh commented 6 years ago

Continuing discussion in new issue.