Closed deansher closed 2 years ago
@deansher Thanks for reporting this. We haven't seen segfaults on our end with this function, and it is one that is covered in a few ways with our test suite. Can you provide a minimal, reproducible example that demonstrates this problem so we can properly follow this up.
... Not related to the problem itself (which we do want to understand and fix): I don't think that Welch's method is going to be the best thing to use for the Kaggle dataset. The data snippets are very short (not representing real data) and this will make it hard for Welch's method to reliably estimate the PSD, while still capturing narrow line features in the PSD. A couple of observations from the Kaggle dataset can make estimating a PSD easier:
All the data snippets use the same PSD (again not realistic, but helps simplify this task). -> I should clarify that each of the data snippets contains data from 3 instruments, each has its own PSD, but it's the same over multiple samples.
Therefore you can simply take a large set of the "no-signal present" data snippets, apply some form of windowing, Fourier transform them, take the absolute of each frequency point, and then take the mean average. Welch's method is used in real data because signals might be present in the data and mess up this process. Here's a code snippet (using the Kaggle structure) illustrating this:
from pycbc.types import TimeSeries
import numpy as np
labels = np.loadtxt('training_labels.csv', delimiter=',', dtype=str) label_dict = {} for idd, label in labels[1:]: label_dict[idd] = int(label)
inputs = ! ls train/0/0//npy
det1_psd = None det2_psd = None det3_psd = None
count = 0 for idx, filenam in enumerate(inputs): fil_id = filenam.split('/')[4].split('.')[0]
# Must skip the file if it contains a signal.
if label_dict[fil_id] == 1:
continue
count += 1
example_arr = np.load(filenam)
det1 = TimeSeries(example_arr[0], delta_t=1./2048.)
det2 = TimeSeries(example_arr[1], delta_t=1./2048.)
det3 = TimeSeries(example_arr[2], delta_t=1./2048.)
# Apply a windowing. Nothing fancy here, I just took an example from the notebooks on the Kaggle page.
window = tukey(4096, alpha=0.2)
det1 = det1 * window
det2 = det2 * window
det3 = det3 * window
# This Fourier transforms the data
det1f = det1.to_frequencyseries()
det2f = det2.to_frequencyseries()
det3f = det3.to_frequencyseries()
# Then I take the abs value and compute the rolling mean. It would perhaps be more CPU efficient, and simpler to read,
# to store all the `abs(detXf)` and then average at the end, but it requires a lot of RAM.
if count == 1:
det1_psd = abs(det1f)
det2_psd = abs(det2f)
det3_psd = abs(det3f)
else:
det1_psd = (abs(det1f) + (count-1) * det1_psd) / count
det2_psd = (abs(det2f) + (count-1) * det2_psd) / count
det3_psd = (abs(det3f) + (count-1) * det3_psd) / count
If you then plot these PSDs you can see that det1 and det2 are both using the Advanced LIGO projected [basic] design sensitivity curve (I think they would have used this one https://pycbc.org/pycbc/latest/html/pycbc.psd.html#pycbc.psd.analytical.aLIGOZeroDetHighPower) and det3 uses the Advanced Virgo projected design sensitivity curve (probably this one https://pycbc.org/pycbc/latest/html/pycbc.psd.html#pycbc.psd.analytical.AdvVirgo, but maybe one of the other AdvVirgo ones).
Wow, best issue response ever! Thank you. I'll try for a minimal, reproducible example, and otherwise proceed as you suggest.
Here is a minimal, reproducible example. I'm using Deep Learning AMI GPU PyTorch 1.9.0 (Amazon Linux 2). I included a conda explicit specification file. repro-pycbc-fftw.zip
@deansher Thanks for sharing this! @spxiwh I can reproduce the issue with the conda environment provided, but notably not using code locally installed on my machine using a standard virtualenv setup as opposed to conda. I haven't tracked down what the difference is yet, but perhaps you can confirm if see the same difference as it may help track down the root cause here.
@deansher Thanks for providing this, as @ahnitz also noted, I was able to reproduce the issue using the inputs provided!
@ahnitz I am also seeing the same behaviour as you, starting with the custom env sourced I can do:
(testenv) [ian.harry@cl8 repro-pycbc-fftw]$ python repro.py
Segmentation fault
(testenv) [ian.harry@cl8 repro-pycbc-fftw]$ conda activate igwn-py37
(igwn-py37) [ian.harry@cl8 repro-pycbc-fftw]$ python repro.py
Downloading https://hpiers.obspm.fr/iers/bul/bulc/Leap_Second.dat
|==========================================| 1.3k/1.3k (100.00%) 0s
(igwn-py37) [ian.harry@cl8 repro-pycbc-fftw]$
I'll dig around a bit more and see if I can figure out what difference is causing this.
@ahnitz I haven't been able to get any idea of why this case is segfaulting, but it doesn't segfault with our own environments (The "igwn" environment has the same FFTW as this one!)
@deansher I do have a workaround though, which should at least help you make progress if this is blocking you. If you do:
import numpy as np
import pycbc.filter
from pycbc.types import TimeSeries
from pycbc.psd import welch
from pycbc.scheme import MKLScheme
SIGNAL_LEN = 4096
SIGNAL_SECONDS = 2.0
DELTA_T = SIGNAL_SECONDS / SIGNAL_LEN
if __name__ == "__main__":
with MKLScheme() as ctx:
sig = np.load('00000e74ad.npy')[0]
ts = TimeSeries(sig, epoch=0, delta_t=DELTA_T)
high = pycbc.filter.highpass(ts, 15, 8)
welch(high, seg_len=512, seg_stride=256)
The code will force all FFTs through MKL's interface. This does not segfault with the settings you have.
Note that I added some kwargs to welch. These will help it get a better PSD than with the default settings (which are not designed with 2s inputs in mind!). That said, I think the approach above will still be better.
:-) And thank you again! We haven't been blocked on this yet -- lots else to do.
Confirmed: if I install my dependencies other than PyCBC with conda
, but then install PyCBC and its dependencies with pip
, I do not get the seg fault.
I think this may have been related to #4052 which we recently did track down to in issue with MKL built libraries being loaded before FFTW confusing the symbol table. I'm closing this for now, but please re-open if similar issues are found in the current version.
Hello, I'm participating in the Kaggle contest. I'm getting a seg fault on Linux that I don't get on MacOS. Unfortunately, this is in my effort to preprocess the Kaggle dataset, and I don't have enough disk space on my MacBook to do the whole dataset in one go, so it would really help for this to work on Linux.
Here's the line of code I am trying to execute:
I have reproduced this on several versions of pycbc, including the most recent. The following Python stacktrace is from
pycbc 1.16.12 py37hc615d7d_2 conda-forge/linux-64
:And from
gdb
: