deansher commented 3 years ago

Hello, I'm participating in the Kaggle contest. I'm getting a seg fault on Linux that I don't get on MacOS. Unfortunately, this is in my effort to preprocess the Kaggle dataset, and I don't have enough disk space on my MacBook to do the whole dataset in one go, so it would really help for this to work on Linux.

Here's the line of code I am trying to execute:

    psd = interpolate(welch(high), 1.0 / high.duration)

I have reproduced this on several versions of pycbc, including the most recent. The following Python stacktrace is from pycbc 1.16.12 py37hc615d7d_2 conda-forge/linux-64:

  File "/opt/conda/envs/kaggle-gw/lib/python3.7/site-packages/pycbc/fft/fftw.py", line 319 in execute
  File "/opt/conda/envs/kaggle-gw/lib/python3.7/site-packages/pycbc/fft/fftw.py", line 325 in fft
  File "/opt/conda/envs/kaggle-gw/lib/python3.7/site-packages/pycbc/fft/func_api.py", line 52 in fft
  File "/opt/conda/envs/kaggle-gw/lib/python3.7/site-packages/pycbc/psd/estimate.py", line 158 in welch
  File "/home/ec2-user/kaggle-gw/preprocess/filter_sig.py", line 43 in process_sig
  File "/home/ec2-user/kaggle-gw/preprocess/filter_sig.py", line 67 in <listcomp>
  File "/home/ec2-user/kaggle-gw/preprocess/filter_sig.py", line 67 in process
  File "/home/ec2-user/kaggle-gw/preprocess/__main__.py", line 53 in preprocess_train_or_test
  File "/home/ec2-user/kaggle-gw/preprocess/__main__.py", line 84 in preprocess
  File "/home/ec2-user/kaggle-gw/preprocess/__main__.py", line 111 in <module>
  File "/opt/conda/envs/kaggle-gw/lib/python3.7/runpy.py", line 85 in _run_code
  File "/opt/conda/envs/kaggle-gw/lib/python3.7/runpy.py", line 193 in _run_module_as_main

And from gdb:

Program received signal SIGSEGV, Segmentation fault.
0x0000001e00544644 in ?? ()
(gdb) bt 10
#0  0x0000001e00544644 in ?? ()
#1  0x00007fffef3029ed in ffi_call_unix64 () from /opt/conda/envs/kaggle-gw/lib/python3.7/lib-dynload/../../libffi.so.7
#2  0x00007fffef302077 in ffi_call_int () from /opt/conda/envs/kaggle-gw/lib/python3.7/lib-dynload/../../libffi.so.7
#3  0x00007fffef318794 in _call_function_pointer (argcount=3, resmem=0x7fffffffbaf0, restype=<optimized out>, atypes=<optimized out>,
    avalues=0x7fffffffbad0, pProc=0x7fffdedaf030 <fftw_execute_dft_r2c>, flags=<optimized out>)
    at /usr/local/src/conda/python-3.7.10/Modules/_ctypes/callproc.c:816
#4  _ctypes_callproc () at /usr/local/src/conda/python-3.7.10/Modules/_ctypes/callproc.c:1188
#5  0x00007fffef318f20 in PyCFuncPtr_call () at /usr/local/src/conda/python-3.7.10/Modules/_ctypes/_ctypes.c:4025
#6  0x00005555556e2a5b in _PyObject_FastCallKeywords ()
    at /home/conda/feedstock_root/build_artifacts/python_1613748395163/work/Python/errors.c:176
#7  0x00005555556e3b59 in call_function () at /home/conda/feedstock_root/build_artifacts/python_1613748395163/work/Python/ceval.c:4619
#8  0x000055555570a4ac in _PyEval_EvalFrameDefault ()
    at /home/conda/feedstock_root/build_artifacts/python_1613748395163/work/Python/ceval.c:3124
#9  0x000055555567fe94 in PyEval_EvalFrameEx (throwflag=0, f=0x7fffdc22f620)
    at /home/conda/feedstock_root/build_artifacts/python_1613748395163/work/Python/ceval.c:544

spxiwh commented 3 years ago

@deansher Thanks for reporting this. We haven't seen segfaults on our end with this function, and it is one that is covered in a few ways with our test suite. Can you provide a minimal, reproducible example that demonstrates this problem so we can properly follow this up.

... Not related to the problem itself (which we do want to understand and fix): I don't think that Welch's method is going to be the best thing to use for the Kaggle dataset. The data snippets are very short (not representing real data) and this will make it hard for Welch's method to reliably estimate the PSD, while still capturing narrow line features in the PSD. A couple of observations from the Kaggle dataset can make estimating a PSD easier:

All of the data snippets are simulated (not real) Gaussian noise, which sometimes include a signal.
For the training set you know which snippets include a signal.
All the data snippets use the same PSD (again not realistic, but helps simplify this task). -> I should clarify that each of the data snippets contains data from 3 instruments, each has its own PSD, but it's the same over multiple samples.

Therefore you can simply take a large set of the "no-signal present" data snippets, apply some form of windowing, Fourier transform them, take the absolute of each frequency point, and then take the mean average. Welch's method is used in real data because signals might be present in the data and mess up this process. Here's a code snippet (using the Kaggle structure) illustrating this:
```
from pycbc.types import TimeSeries
import numpy as np
```

Identify which datasets have signals, and which do not

labels = np.loadtxt('training_labels.csv', delimiter=',', dtype=str) label_dict = {} for idd, label in labels[1:]: label_dict[idd] = int(label)

Identify a bunch of inputs. Could use more, but it takes longer and this is already good.

inputs = ! ls train/0/0//npy

det1_psd = None det2_psd = None det3_psd = None

count = 0 for idx, filenam in enumerate(inputs): fil_id = filenam.split('/')[4].split('.')[0]

# Must skip the file if it contains a signal. 
if label_dict[fil_id] == 1:
    continue

count += 1
example_arr = np.load(filenam)
det1 = TimeSeries(example_arr[0], delta_t=1./2048.)
det2 = TimeSeries(example_arr[1], delta_t=1./2048.)
det3 = TimeSeries(example_arr[2], delta_t=1./2048.)

# Apply a windowing. Nothing fancy here, I just took an example from the notebooks on the Kaggle page. 
window = tukey(4096, alpha=0.2)
det1 = det1 * window
det2 = det2 * window
det3 = det3 * window

# This Fourier transforms the data
det1f = det1.to_frequencyseries()
det2f = det2.to_frequencyseries()
det3f = det3.to_frequencyseries()

# Then I take the abs value and compute the rolling mean. It would perhaps be more CPU efficient, and simpler to read,
# to store all the `abs(detXf)` and then average at the end, but it requires a lot of RAM.
if count == 1:
    det1_psd = abs(det1f)
    det2_psd = abs(det2f)
    det3_psd = abs(det3f)
else:
    det1_psd = (abs(det1f) + (count-1) * det1_psd) / count
    det2_psd = (abs(det2f) + (count-1) * det2_psd) / count
    det3_psd = (abs(det3f) + (count-1) * det3_psd) / count



If you then plot these PSDs you can see that det1 and det2 are both using the Advanced LIGO projected [basic] design sensitivity curve (I think they would have used this one https://pycbc.org/pycbc/latest/html/pycbc.psd.html#pycbc.psd.analytical.aLIGOZeroDetHighPower) and det3 uses the Advanced Virgo projected design sensitivity curve (probably this one https://pycbc.org/pycbc/latest/html/pycbc.psd.html#pycbc.psd.analytical.AdvVirgo, but maybe one of the other AdvVirgo ones).

deansher commented 3 years ago

Wow, best issue response ever! Thank you. I'll try for a minimal, reproducible example, and otherwise proceed as you suggest.

deansher commented 3 years ago

Here is a minimal, reproducible example. I'm using Deep Learning AMI GPU PyTorch 1.9.0 (Amazon Linux 2). I included a conda explicit specification file. repro-pycbc-fftw.zip

ahnitz commented 3 years ago

@deansher Thanks for sharing this! @spxiwh I can reproduce the issue with the conda environment provided, but notably not using code locally installed on my machine using a standard virtualenv setup as opposed to conda. I haven't tracked down what the difference is yet, but perhaps you can confirm if see the same difference as it may help track down the root cause here.

spxiwh commented 3 years ago

@deansher Thanks for providing this, as @ahnitz also noted, I was able to reproduce the issue using the inputs provided!

@ahnitz I am also seeing the same behaviour as you, starting with the custom env sourced I can do:

(testenv) [ian.harry@cl8 repro-pycbc-fftw]$ python repro.py 
Segmentation fault
(testenv) [ian.harry@cl8 repro-pycbc-fftw]$ conda activate igwn-py37
(igwn-py37) [ian.harry@cl8 repro-pycbc-fftw]$ python repro.py 
Downloading https://hpiers.obspm.fr/iers/bul/bulc/Leap_Second.dat
|==========================================| 1.3k/1.3k (100.00%)         0s
(igwn-py37) [ian.harry@cl8 repro-pycbc-fftw]$

I'll dig around a bit more and see if I can figure out what difference is causing this.

spxiwh commented 3 years ago

@ahnitz I haven't been able to get any idea of why this case is segfaulting, but it doesn't segfault with our own environments (The "igwn" environment has the same FFTW as this one!)

@deansher I do have a workaround though, which should at least help you make progress if this is blocking you. If you do:

import numpy as np
import pycbc.filter
from pycbc.types import TimeSeries
from pycbc.psd import welch
from pycbc.scheme import MKLScheme

SIGNAL_LEN = 4096
SIGNAL_SECONDS = 2.0
DELTA_T = SIGNAL_SECONDS / SIGNAL_LEN

if __name__ == "__main__":
    with MKLScheme() as ctx:
        sig = np.load('00000e74ad.npy')[0]
        ts = TimeSeries(sig, epoch=0, delta_t=DELTA_T)
        high = pycbc.filter.highpass(ts, 15, 8)
        welch(high, seg_len=512, seg_stride=256)

The code will force all FFTs through MKL's interface. This does not segfault with the settings you have.

Note that I added some kwargs to welch. These will help it get a better PSD than with the default settings (which are not designed with 2s inputs in mind!). That said, I think the approach above will still be better.

deansher commented 3 years ago

:-) And thank you again! We haven't been blocked on this yet -- lots else to do.

deansher commented 3 years ago

Confirmed: if I install my dependencies other than PyCBC with conda, but then install PyCBC and its dependencies with pip, I do not get the seg fault.

ahnitz commented 2 years ago

I think this may have been related to #4052 which we recently did track down to in issue with MKL built libraries being loaded before FFTW confusing the symbol table. I'm closing this for now, but please re-open if similar issues are found in the current version.

gwastro / pycbc

seg fault in fftw when using welch on Linux #3761

Identify which datasets have signals, and which do not

Identify a bunch of inputs. Could use more, but it takes longer and this is already good.