gwastro / pycbc

Core package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.
http://pycbc.org
GNU General Public License v3.0
314 stars 351 forks source link

filter.matchedfilter.match is not accurate for close waveforms, and subsample_interpolation does not fix the issue #3817

Closed jacopok closed 2 years ago

jacopok commented 3 years ago

When computing mismatches between waveforms that are very similar, but shifted by a time not corresponding to an integer number of timesteps, the pycbc.matchedfilter.match function is unreliable.

I have a MWE (well, as minimal as I could make it) for the issue, with some mock data. I create some waveforms with small random fluctuations so that their unfaithfulness is ~1e-8. If these are shifted by a sub-sample amount, by adding a linear term to the phase, the results of computing one minus their match fluctuate by up to two orders of magnitude.

After showing the problem I give a "toy" implementation for a mismatch-computing function that does not suffer from this issue.

I do not know whether this is worth implementing in pycbc, but I propose to at least include a warning in the documentation to discourage people from trying to use match to recover mismatches smaller than, say, 1e-4.

import numpy as np
from pycbc.filter.matchedfilter import match
import matplotlib.pyplot as plt
from pycbc.types import TimeSeries, FrequencySeries

N = 1000
tmax = 100

ts = np.linspace(0, tmax, num=N)
delta_t = tmax / N
rng = np.random.default_rng(seed=1)
phase1 = ts + rng.normal(loc=0, scale=0.0001, size=N)
phase2 = ts + rng.normal(loc=0, scale=0.0001, size=N)

waveform1 = np.cos(phase1)
waveform2 = np.cos(phase2)

series1 = TimeSeries(waveform1, delta_t=delta_t)
series2 = TimeSeries(waveform2, delta_t=delta_t)

series1_fd = series1.to_frequencyseries()
series2_fd = series2.to_frequencyseries()

psd = FrequencySeries(np.ones(N//2+1), delta_f=series1_fd.delta_f)

m, i = match(series1_fd, series2_fd, psd=psd)

f = series1_fd.sample_frequencies

def shift(frequencyseries, dt):
    return frequencyseries * np.exp(2j * np.pi * f * dt)

mismatches = []
mismatches_ssi = []

dts = np.linspace(-delta_t, delta_t, num=100)

for dt in dts:
    m, _ = match(series1_fd, shift(series2_fd, dt), psd=psd)
    m_ssi, _ = match(series1_fd, shift(series2_fd, dt), psd=psd, subsample_interpolation=True)
    mismatches.append(1-m)
    mismatches_ssi.append(1-m_ssi)

plt.semilogy(dts, mismatches, label='w/o SSI')
plt.semilogy(dts, mismatches_ssi, label='with SSI')

plt.legend()

plt.xlabel('time shift')
plt.ylabel('Unfaithfulness')

unfaithfulness_mwe

The "optimized" curve at the bottom is what I get when computing the same mismatches with an optimization procedure, written as follows:

from scipy import integrate
from scipy.optimize import minimize_scalar

def mismatch(wf1, wf2, psd, f):
    """Own implementation for a mismatch-computer, 
    optimizing the Wiener product.
    """

    wf1 = wf1.numpy()
    wf2 = wf2.numpy()
    psd = psd.numpy()

    def product(a, b):
        integral = integrate.trapezoid(np.conj(a) * b / psd, x=f)
        return abs(integral)

    norm = np.sqrt(product(wf1, wf1) * product(wf2, wf2))

    def to_minimize(dt):
        offset = np.exp(2j * np.pi * f * dt)
        return - product(wf1, wf2 * offset)

    res = minimize_scalar(to_minimize, method='brent', bracket=(-.1, .1))
    return 1 - (-res.fun) / norm

mismatches_optimized = []

for dt in dts:
    mismatches_optimized.append(mismatch(series1_fd, shift(series2_fd, dt), psd, f))

plt.semilogy(dts, mismatches_optimized, label='optimized')
ahnitz commented 3 years ago

@jacopok I think either approach would be useful. I think having a version which does the time optimization for you is perfectly sensible, and a documentation improvement when not the case would also be good. These are both things we'd love to accept contributions for.

jacopok commented 3 years ago

Thanks! As soon as I can I will try and make a pull request for them, starting with the documentation update.

jacopok commented 2 years ago

The feature was implemented as the optimized_match function in PR https://github.com/gwastro/pycbc/pull/3939.