gwastro / pycbc

Core package to analyze gravitational-wave data, find signals, and study their parameters. This package was used in the first direct detection of gravitational waves (GW150914), and is used in the ongoing analysis of LIGO/Virgo data.
http://pycbc.org
GNU General Public License v3.0
307 stars 344 forks source link

Bug in PyCBC reproducible noise: Discontinuities present over frame boundaries #4772

Open spxiwh opened 1 month ago

spxiwh commented 1 month ago

Details from @ArthurTolley:

We are trying to use pycbc_condition_strain to make a long stretch of simulated data with short frame files (to test PyCBC Live like operation). This saw a lot of glitches, Arthur produced the following minimal example:

fake_strain=aLIGODesignSensitivityT1800044
ifo=H1
duration=32
out_path="./${ifo}-SIMULATED_STRAIN-{start}-{duration}.gwf"

for chunk in {1..2}; do
    gps_start_time=$((1200000000+(${chunk}-1)*64))
    gps_end_time=$((1200000000+${chunk}*64))

    pycbc_condition_strain \
        --fake-strain ${fake_strain} \
        --fake-strain-seed 1234 \
        --output-strain-file ${out_path} \
        --gps-start-time ${gps_start_time} \
        --gps-end-time ${gps_end_time} \
        --sample-rate 2048 \
        --frame-duration ${duration} \
        --channel-name $ifo:SIMULATED_STRAIN \

done

which produces two data chunks with a discontinuous boundary (Q-scan over boundary):

image

I can't see anything wrong here. Arthur says if the PSD is changed to aLIGOMidLowSensitivityP1200087 the data is good, but we shouldn't have "bad" and "good" PSDs in this context.

spxiwh commented 1 month ago

Plotting code to make omega scan:

import matplotlib.pyplot as plt
import h5py
import numpy
from pycbc.frame import read_frame
import glob
from pycbc.filter import resample_to_delta_t

# Change data dir, change figure save location

# Find the data of the time you want to plot
H1_data_dir = './'

time = int(1200000064)
previous_frame = time - 32
next_frame = time + 32

# Read in the frames at the boundary
h1_frames = [H1_data_dir + f'H1-SIMULATED_STRAIN-{previous_frame}-32.gwf',
             H1_data_dir + f'H1-SIMULATED_STRAIN-{time}-32.gwf',
             H1_data_dir + f'H1-SIMULATED_STRAIN-{next_frame}-32.gwf']

print(f" Reading in frames")
h1_data = read_frame(h1_frames, 'H1:SIMULATED_STRAIN', start_time=time-16, end_time = time+16)
# h1_data = resample_to_delta_t(h1_data, 1/512)

fig, ax = plt.subplots(figsize=(12,10))
ax.set_rasterized(True)

# Plot the data
print(" Whitening and qtransforming data")
t, f, p = h1_data.whiten(4, 4).qtransform(.001,
                                        logfsteps=100,
                                        qrange = [40.0, 45.0],
                                        frange = [20, 100],)
print(" Plotting qscan underneath")
ax.pcolormesh(t, f, p**0.5, vmin=1.0, vmax=5.0, shading = 'auto')
ax.set_yscale('log')

ymin, ymax = 20, 100
xmin, xmax = ax.get_xlim()
ax.set_ylim(ymin, ymax)
ax.set_xlim(xmin, xmax)

ax.set_ylabel('Frequency [Hz]')
ax.set_xlabel('Time [s]')
plt.show()
plt.savefig(f'{time}.png')
plt.close()
print(" Done")
ahnitz commented 1 month ago

@spxiwh The hint from @ArthurTolley I think explains the issue potentially. Try the following and I think it will look "ok".

fake_strain=aLIGODesignSensitivityT1800044
ifo=H1
duration=32
out_path="./${ifo}-SIMULATED_STRAIN-{start}-{duration}.gwf"

for chunk in {1..2}; do
    gps_start_time=$((1200000000+(${chunk}-1)*64))
    gps_end_time=$((1200000000+${chunk}*64))

    pycbc_condition_strain \
        --fake-strain ${fake_strain} \
        --fake-strain-seed 1234 \
        --fake-strain-flow  10 \
        --output-strain-file ${out_path} \
        --gps-start-time ${gps_start_time} \
        --gps-end-time ${gps_end_time} \
        --sample-rate 2048 \
        --frame-duration ${duration} \
        --channel-name $ifo:SIMULATED_STRAIN

done

I suspect that the PSD you are using goes to somewhat interesting behavior (e.g. which result in numerical instability near DC) if you allow it to go to zero frequency, so you need the option to set the flow for the noise generation.

ahnitz commented 1 month ago

Figure_1

ArthurTolley commented 1 month ago

Thanks @ahnitz, annoyingly I used --low-frequency-cutoff 10 which we initially thought was the cause of the problem but this argument seems to do nothing for our use case of pycbc_condition_strain. If I used the correct argument to begin with I wouldn't have seen this issue at all!

titodalcanton commented 1 month ago

Phew, the documentation page for condition_strain does mention --fake-strain-flow 10 (though it does not explain why it is important, nor what is really happening behind the scenes).

ArthurTolley commented 1 month ago

Serves me right for using an old example (from pycbc/examples/live/run.sh) instead of going directly to the documentation 😅

GarethCabournDavies commented 1 month ago

Unless there are objections, I will re-name the issue to be that the low-frequency-cutoff options in pycbc_condition_strain need explaining better in the documentation

titodalcanton commented 1 month ago

We should probably add --fake-strain-flow to the PyCBC Live example as well.

spxiwh commented 1 month ago

I still believe this is a bug, and not a documentation problem. It may be an unavoidable and understandable bug, but it is dangerous, and we probably need a sanity check in the reproducible noise module to check for things like the PSD covering too large a dynamic range and/or certain values going out of a valid range.