lightkurve / lightkurve

A friendly package for Kepler & TESS time series analysis in Python.
https://docs.lightkurve.org
MIT License
412 stars 166 forks source link

CBV corrector not working for 10 min TessCut data #1005

Open rebekah9969 opened 3 years ago

rebekah9969 commented 3 years ago

I have been trying to apply CBVCorrector to FFI cut out but am running into some issues.

import lightkurve as lk
search_lc = lk.search_tesscut("KT Eri")
S32 = search_lc[1].download(cutout_size=10)
lc_S32 = S32.to_lightcurve(aperture_mask='threshold')

from lightkurve.correctors import CBVCorrector
cbvCorrector = CBVCorrector(lc_S32)
cbvCorrector.cbvs

This returns

[TESS CBVs, Sector.Camera.CCD : 32.2.4, CBVType : SingleScale, nCBVS : 16,
 TESS CBVs, Sector.Camera.CCD : 32.2.4, CBVType.Band: MultiScale.1, nCBVs : 8,
 TESS CBVs, Sector.Camera.CCD : 32.2.4, CBVType.Band: MultiScale.2, nCBVs : 8,
 TESS CBVs, Sector.Camera.CCD : 32.2.4, CBVType.Band: MultiScale.3, nCBVs : 5,
 TESS CBVs, Sector.Camera.CCD : 32.2.4, CBVType : Spike, nCBVS : 7]

I then try to plot these,

cbvCorrector.cbvs[0].plot();

And get the following plot which is empty.

Screen Shot 2021-03-18 at 2 46 46 PM

After I try

import numpy as np
cbv_type = ['SingleScale', 'Spike']
cbv_indices = [np.arange(1,9), 'ALL']
cbvCorrector.correct_gaussian_prior(cbv_type=cbv_type, cbv_indices=cbv_indices, alpha=1e-4)
cbvCorrector.diagnose();

Which outputs,

/usr/local/lib/python3.7/dist-packages/lightkurve/correctors/designmatrix.py:314: LightkurveWarning: The design matrix has low rank (0) compared to the number of columns (8), which suggests that the matrix contains duplicate or correlated columns. This may prevent the regression from succeeding. Consider reducing the dimensionality by calling the `pca()` method.
  LightkurveWarning,
/usr/local/lib/python3.7/dist-packages/lightkurve/correctors/designmatrix.py:314: LightkurveWarning: The design matrix has low rank (0) compared to the number of columns (7), which suggests that the matrix contains duplicate or correlated columns. This may prevent the regression from succeeding. Consider reducing the dimensionality by calling the `pca()` method.
  LightkurveWarning,
WARNING: Input data contains invalid values (NaNs or infs), which were automatically clipped. [astropy.stats.sigma_clipping]
WARNING: Input data contains invalid values (NaNs or infs), which were automatically clipped. [astropy.stats.sigma_clipping]
WARNING: Input data contains invalid values (NaNs or infs), which were automatically clipped. [astropy.stats.sigma_clipping]
WARNING: Input data contains invalid values (NaNs or infs), which were automatically clipped. [astropy.stats.sigma_clipping]
WARNING: Input data contains invalid values (NaNs or infs), which were automatically clipped. [astropy.stats.sigma_clipping]

Screen Shot 2021-03-18 at 2 48 57 PM

I note that this method cannot be applied to Kepler 10 min FFI data, but is the same also true for TESS? @jcsmithhere do you have any advice?

Thanks

jcsmithhere commented 3 years ago

Hi @rebekah9969,

The CBVs are loaded from the 2-minute cadence data. When you call CBVCorrector it by default "aligns" the CBVs to the cadences of the light curve (lc_S32). But no cadences line up. Instead try using interpolate_cbvs=True. See the CBVCorrector tutorial section on "Aligning versus Interpolating CBVs."

cbvCorrector = CBVCorrector(lc_S32, interpolate_cbvs=True)
cbvCorrector.cbvs[0].plot();

image

Note that with alpha = 1e-4 you are severely overfitting:

cbvCorrector.goodness_metric_scan_plot(cbv_type=cbv_type, cbv_indices=cbv_indices);

image

This might work better for you:

cbv_type = ['MultiScale.1', 'MultiScale.2', 'MultiScale.3','Spike']
cbv_indices = [np.arange(1,9), np.arange(1,9), np.arange(1,9), 'ALL']
cbvCorrector.goodness_metric_scan_plot(cbv_type=cbv_type, cbv_indices=cbv_indices);

image

cbvCorrector.correct_gaussian_prior(cbv_type=cbv_type, cbv_indices=cbv_indices, alpha=1e-1)
pltAxis = cbvCorrector.diagnose()
pltAxis[0].set_ylim(650, 800);
pltAxis[1].set_ylim(650, 800);

image

Of course, even more tweaking might be needed...

barentsen commented 3 years ago

Thinking out loud: applying CBV correction to TESSCut data is likely one of the principal use cases for CBVCorrector, so I wonder if we should have interpolate_cbvs=True by default to make the API more friendly.

Perhaps we can think of more user-friendly mechanisms to alert users of the dangers of interpolation, e.g. something folded into one of the diagnose() plots?

rebekah9969 commented 3 years ago

I agree with what @barentsen stated above and thank you @jcsmithhere

jcsmithhere commented 3 years ago

Yes, I agree. We need to make this more user friendly. I very much welcome the feedback to understand where users are hitting confusing aspects or bugs. Perhaps the default between interpolate vs align will be dependent on which data type the class is being applied to. Align really is more reliable. Interpolation can introduce artifacts, and which interpolation method you use really depends on which types of signals one is interested in.

I also think more integration between the CBVCorrector and PLDCorrector classes would be good. They really are two complementary methods and we should set up tools to apply both to the same data set, while using the same performance metrics.