Variance calculation in KTGainVariationProcessor is not done correctly

nsoblath commented 6 years ago

This only applies to the situation where a spectrum has been supplied without a variance data object.

The variance calculation should take into account that the mean changes as a function of frequency. Currently that is not done (well, it's sort of done for real values; the mean for a given window is the running sum in that window).

For real values, what we need to do is calculate the gain variation mean first, then supply that to the calculation of the variance, and have it subtracted off for each bin. Here, for instance, is the variance calculation from Katydid v2.9.2's KTVariableSpectrumDiscriminator::DiscriminateSpectrum(KTPowerSpectrum...) (line 570ff):

            for (unsigned iBin=fMinBin; iBin<=fMaxBin; ++iBin)
            {
                diff = (*spectrum)(iBin) - (*splineImp)(iBin - fMinBin);
                sigma += diff * diff;
            }

For complex values, I'm not sure what the right procedure is. In Katydid v2.9.2's KTVariableSpectrumDiscriminator::DiscriminateSpectrum(KTFrequencySpectrumFFTW...) (lines 489ff) it uses the magnitudes of the values. However, one can find the variance of a complex number: Var[Z] = E[|Z|^2] - |E[Z]|^2. It's just that I'm not sure what to do when E[Z(f)] changes as a function of f. Here's the relevant code from v2.9.2, though, should it prove useful:

            for (unsigned iBin=fMinBin; iBin<=fMaxBin; ++iBin)
            {
                fMagnitudeCache[iBin] = sqrt((*spectrum)(iBin)[0] * (*spectrum)(iBin)[0] + (*spectrum)(iBin)[1] * (*spectrum)(iBin)[1]);
                diff = fMagnitudeCache[iBin] - (*splineImp)(iBin - fMinBin);
                sigma += diff * diff;
            }

evzayas commented 6 years ago

I'm a little confused, was this not resolved with the inclusion of variance splines? v2.9.2 predates that, and the current (develop) version utilizes it this way:

                variance = (*varSplineImp)(iBin - fMinBin);
                threshold = mean + fSigmaThreshold * sqrt( variance );
                value = (*spectrum)(iBin);

Before this it returns an exception if either spline or varSpline are null. Is there a different problem I'm not understanding correctly?

guiguem commented 6 years ago

Good catch! If Spline does calculate the variance out of the box, we should trust it. If one wants to calculate the variance manually, the E[Z(f)] would be a local average over N points (N being the order of the spline or order of the spline -1, depends on the convention), whihc is not the case right now...

evzayas commented 6 years ago

the variance spline is not calculated out of the box, it's done manually in the data accumulator along with the mean spline. E[Z] and E[Z^2] are calculated iteratively as the spectra come in, and then before the signal is released the E[Z^2] spectrum is replaced by E[Z^2] - E[Z]^2

nsoblath commented 6 years ago

It's not exactly related to this issue, but since it was brought up, for complex-valued spectra, the splines that come out of the data accumulator are the magnitudes of the accumulated spectra.

This issue is related to the variance calculation that's done when splines are not used.

project8 / katydid

Variance calculation in KTGainVariationProcessor is not done correctly #159