nanograv / enterprise

ENTERPRISE (Enhanced Numerical Toolbox Enabling a Robust PulsaR Inference SuitE) is a pulsar timing analysis code, aimed at noise analysis, gravitational-wave searches, and timing model analysis.
https://enterprise.readthedocs.io
MIT License
64 stars 65 forks source link

Segmentation fault in test suite #339

Open aarchiba opened 1 year ago

aarchiba commented 1 year ago

I have a fresh install of the development version, and when I run the test suite I get a segmentation fault. The fault occurs at a different place in the test suite each time, once reaching 92% before dying, so it's possible the test suite might actually pass eventually without the problem being fixed.

I installed the development version following these instructions:

conda create -n ent_dev -y -c conda-forge python=3.9
conda activate ent_dev
conda install -y -c conda-forge enterprise-pulsar
conda remove enterprise-pulsar --force
conda install -y -c conda-forge black==22.3.0 flake8 sphinx_rtd_theme pytest-cov
pip install coverage_conditional_plugin
pip install -e .

Here is a partial list of the tests where the segfault has occurred:

tests/test_white_signals.py::TestWhiteSignalsPint::test_add_efac_tnequad Fatal Python error: make: *** [Makefile:69: test] Segmentation fault (core dumped)
tests/test_pta.py::TestPTASignals::test_parameterized_orf make: *** [Makefile:69: test] Segmentation fault (core dumped)
tests/test_pta.py::TestPTASignals::test_parameterized_orf Fatal Python error: make: *** [Makefile:69: test] Segmentation fault (core dumped)
tests/test_pulsar.py::TestPulsar::test_deflate_inflate Fatal Python error: Fatal Python error: Segmentation fault
tests/test_gp_signals.py::TestGPSignals::test_combine_signals Fatal Python error: Fatal Python error: Fatal Python error: Segmentation faultSegmentation faultSegmentation fault
paulthebaker commented 1 year ago

Can you give us some system and OS info to see if we can get someone to reproduce it?

aarchiba commented 1 year ago

I'm not quite sure what you would like, but here:

$ uname -a
Linux rhino 5.19.0-29-generic #30-Ubuntu SMP PREEMPT_DYNAMIC Wed Jan 4 12:14:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux

CPU:

model name  : Intel(R) Xeon(R) CPU E3-1220 v3 @ 3.10GHz
$ conda list
# packages in environment at /home/peridot/software/miniconda3/envs/ent_dev:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
_openmp_mutex             4.5                       2_gnu    conda-forge
alabaster                 0.7.13             pyhd8ed1ab_0    conda-forge
appdirs                   1.4.4              pyh9f0ad1d_0    conda-forge
arviz                     0.14.0             pyhd8ed1ab_0    conda-forge
astropy                   5.2.1            py39h389d5f1_0    conda-forge
attrs                     22.2.0             pyh71513ae_0    conda-forge
babel                     2.11.0             pyhd8ed1ab_0    conda-forge
black                     22.3.0             pyhd8ed1ab_0    conda-forge
brotli                    1.0.9                h166bdaf_8    conda-forge
brotli-bin                1.0.9                h166bdaf_8    conda-forge
brotlipy                  0.7.0           py39hb9d737c_1005    conda-forge
bzip2                     1.0.8                h7f98852_4    conda-forge
c-ares                    1.18.1               h7f98852_0    conda-forge
ca-certificates           2022.12.7            ha878542_0    conda-forge
certifi                   2022.12.7          pyhd8ed1ab_0    conda-forge
cffi                      1.15.1           py39he91dace_3    conda-forge
cfitsio                   4.2.0                hd9d235c_0    conda-forge
cftime                    1.6.2            py39h2ae25f5_1    conda-forge
charset-normalizer        2.1.1              pyhd8ed1ab_0    conda-forge
click                     8.1.3           unix_pyhd8ed1ab_2    conda-forge
colorama                  0.4.6              pyhd8ed1ab_0    conda-forge
contourpy                 1.0.7            py39h4b4f3f3_0    conda-forge
corner                    2.2.1              pyhd8ed1ab_0    conda-forge
coverage                  7.1.0            py39h72bdee0_0    conda-forge
coverage-conditional-plugin 0.8.0                    pypi_0    pypi
cryptography              39.0.0           py39h079d5ae_0    conda-forge
curl                      7.87.0               hdc1c0ab_0    conda-forge
cycler                    0.11.0             pyhd8ed1ab_0    conda-forge
dataclasses               0.8                pyhc8e2a94_3    conda-forge
docutils                  0.17.1           py39hf3d152e_3    conda-forge
emcee                     3.1.4              pyhd8ed1ab_0    conda-forge
enterprise-pulsar         3.3.4.dev2+g1ca6d38.d20230203          pypi_0    pypi
ephem                     4.1.4            py39h72bdee0_0    conda-forge
exceptiongroup            1.1.0              pyhd8ed1ab_0    conda-forge
fftw                      3.3.10          nompi_hf0379b8_106    conda-forge
flake8                    6.0.0              pyhd8ed1ab_0    conda-forge
fonttools                 4.38.0           py39hb9d737c_1    conda-forge
freetype                  2.12.1               hca18f0e_1    conda-forge
future                    0.18.3             pyhd8ed1ab_0    conda-forge
gmp                       6.2.1                h58526e2_0    conda-forge
gsl                       2.7                  he838d99_0    conda-forge
hdf4                      4.2.15               h9772cbc_5    conda-forge
hdf5                      1.12.2          nompi_h4df4325_101    conda-forge
healpy                    1.16.2           py39ha8baebe_0    conda-forge
icu                       70.1                 h27087fc_0    conda-forge
idna                      3.4                pyhd8ed1ab_0    conda-forge
imagesize                 1.4.1              pyhd8ed1ab_0    conda-forge
importlib-metadata        6.0.0              pyha770c72_0    conda-forge
iniconfig                 2.0.0              pyhd8ed1ab_0    conda-forge
jinja2                    3.1.2              pyhd8ed1ab_1    conda-forge
jpeg                      9e                   h166bdaf_2    conda-forge
jplephem                  2.18               pyh78acc04_0    conda-forge
keyutils                  1.6.1                h166bdaf_0    conda-forge
kiwisolver                1.4.4            py39hf939315_1    conda-forge
krb5                      1.20.1               h81ceb04_0    conda-forge
lcms2                     2.14                 hfd0df8a_1    conda-forge
ld_impl_linux-64          2.40                 h41732ed_0    conda-forge
lerc                      4.0.0                h27087fc_0    conda-forge
libaec                    1.0.6                hcb278e6_1    conda-forge
libblas                   3.9.0           16_linux64_openblas    conda-forge
libbrotlicommon           1.0.9                h166bdaf_8    conda-forge
libbrotlidec              1.0.9                h166bdaf_8    conda-forge
libbrotlienc              1.0.9                h166bdaf_8    conda-forge
libcblas                  3.9.0           16_linux64_openblas    conda-forge
libcurl                   7.87.0               hdc1c0ab_0    conda-forge
libdeflate                1.17                 h0b41bf4_0    conda-forge
libedit                   3.1.20191231         he28a2e2_2    conda-forge
libev                     4.33                 h516909a_1    conda-forge
libffi                    3.4.2                h7f98852_5    conda-forge
libgcc-ng                 12.2.0              h65d4601_19    conda-forge
libgfortran-ng            12.2.0              h69a702a_19    conda-forge
libgfortran5              12.2.0              h337968e_19    conda-forge
libgomp                   12.2.0              h65d4601_19    conda-forge
libhwloc                  2.8.0                h32351e8_1    conda-forge
libiconv                  1.17                 h166bdaf_0    conda-forge
libjpeg-turbo             2.1.4                h166bdaf_0    conda-forge
liblapack                 3.9.0           16_linux64_openblas    conda-forge
libnetcdf                 4.8.1           nompi_h261ec11_106    conda-forge
libnghttp2                1.51.0               hff17c54_0    conda-forge
libnsl                    2.0.0                h7f98852_0    conda-forge
libopenblas               0.3.21          pthreads_h78a6416_3    conda-forge
libpng                    1.6.39               h753d276_0    conda-forge
libsqlite                 3.40.0               h753d276_0    conda-forge
libssh2                   1.10.0               hf14f497_3    conda-forge
libstdcxx-ng              12.2.0              h46fd767_19    conda-forge
libstempo                 2.4.5            py39h4661b88_1    conda-forge
libtiff                   4.5.0                h6adf6a1_2    conda-forge
libuuid                   2.32.1            h7f98852_1000    conda-forge
libwebp-base              1.2.4                h166bdaf_0    conda-forge
libxcb                    1.13              h7f98852_1004    conda-forge
libxml2                   2.10.3               h7463322_0    conda-forge
libzip                    1.9.2                hc929e4a_1    conda-forge
libzlib                   1.2.13               h166bdaf_4    conda-forge
loguru                    0.6.0            py39hf3d152e_2    conda-forge
markupsafe                2.1.2            py39h72bdee0_0    conda-forge
matplotlib-base           3.6.3            py39he190548_0    conda-forge
mccabe                    0.7.0              pyhd8ed1ab_0    conda-forge
metis                     5.1.0             h58526e2_1006    conda-forge
mpfr                      4.1.0                h9202a9a_1    conda-forge
munkres                   1.1.4              pyh9f0ad1d_0    conda-forge
mypy_extensions           0.4.3            py39hf3d152e_6    conda-forge
ncurses                   6.3                  h27087fc_1    conda-forge
nestle                    0.2.0                      py_0    conda-forge
netcdf4                   1.6.2           nompi_py39hfaa66c4_100    conda-forge
numpy                     1.24.1           py39h7360e5f_0    conda-forge
openjpeg                  2.5.0                hfec8fc6_2    conda-forge
openssl                   3.0.7                h0b41bf4_2    conda-forge
packaging                 23.0               pyhd8ed1ab_0    conda-forge
pandas                    1.5.3            py39h2ad29b5_0    conda-forge
pathspec                  0.11.0             pyhd8ed1ab_0    conda-forge
pgplot                    5.2.2             h68245ad_1008    conda-forge
pillow                    9.4.0            py39ha08a7e4_0    conda-forge
pint-pulsar               0.9.3            py39hf3d152e_0    conda-forge
pip                       23.0               pyhd8ed1ab_0    conda-forge
platformdirs              2.6.2              pyhd8ed1ab_0    conda-forge
pluggy                    1.0.0              pyhd8ed1ab_5    conda-forge
pooch                     1.6.0              pyhd8ed1ab_0    conda-forge
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
pycodestyle               2.10.0             pyhd8ed1ab_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyerfa                    2.0.0.1          py39h2ae25f5_3    conda-forge
pyflakes                  3.0.1              pyhd8ed1ab_0    conda-forge
pygments                  2.14.0             pyhd8ed1ab_0    conda-forge
pyopenssl                 23.0.0             pyhd8ed1ab_0    conda-forge
pyparsing                 3.0.9              pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1              pyha2e5f31_6    conda-forge
pytest                    7.2.1              pyhd8ed1ab_0    conda-forge
pytest-cov                4.0.0              pyhd8ed1ab_0    conda-forge
pytest-runner             6.0.0              pyhd8ed1ab_0    conda-forge
python                    3.9.16          h2782a2a_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.9                      3_cp39    conda-forge
pytz                      2022.7.1           pyhd8ed1ab_0    conda-forge
pyyaml                    6.0              py39hb9d737c_5    conda-forge
readline                  8.1.2                h0f457ee_0    conda-forge
requests                  2.28.2             pyhd8ed1ab_0    conda-forge
scikit-sparse             0.4.8            py39h093ab06_1    conda-forge
scipy                     1.10.0           py39h7360e5f_0    conda-forge
setuptools                67.1.0             pyhd8ed1ab_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
snowballstemmer           2.2.0              pyhd8ed1ab_0    conda-forge
sphinx                    5.3.0              pyhd8ed1ab_0    conda-forge
sphinx_rtd_theme          1.1.1              pyha770c72_1    conda-forge
sphinxcontrib-applehelp   1.0.4              pyhd8ed1ab_0    conda-forge
sphinxcontrib-devhelp     1.0.2                      py_0    conda-forge
sphinxcontrib-htmlhelp    2.0.1              pyhd8ed1ab_0    conda-forge
sphinxcontrib-jsmath      1.0.1                      py_0    conda-forge
sphinxcontrib-qthelp      1.0.3                      py_0    conda-forge
sphinxcontrib-serializinghtml 1.1.5              pyhd8ed1ab_2    conda-forge
suitesparse               5.10.1               h9e50725_1    conda-forge
tbb                       2021.7.0             h924138e_1    conda-forge
tempo2                    2022.05.1            h1c8e422_2    conda-forge
tk                        8.6.12               h27826a3_0    conda-forge
toml                      0.10.2             pyhd8ed1ab_0    conda-forge
tomli                     2.0.1              pyhd8ed1ab_0    conda-forge
typed-ast                 1.5.4            py39hb9d737c_1    conda-forge
typing-extensions         4.4.0                hd8ed1ab_0    conda-forge
typing_extensions         4.4.0              pyha770c72_0    conda-forge
tzdata                    2022g                h191b570_0    conda-forge
uncertainties             3.1.7              pyhd8ed1ab_0    conda-forge
unicodedata2              15.0.0           py39hb9d737c_0    conda-forge
urllib3                   1.26.14            pyhd8ed1ab_0    conda-forge
wheel                     0.38.4             pyhd8ed1ab_0    conda-forge
xarray                    2023.1.0           pyhd8ed1ab_0    conda-forge
xarray-einstats           0.5.1              pyhd8ed1ab_0    conda-forge
xorg-kbproto              1.0.7             h7f98852_1002    conda-forge
xorg-libx11               1.7.2                h7f98852_0    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xorg-xproto               7.0.31            h7f98852_1007    conda-forge
xz                        5.2.6                h166bdaf_0    conda-forge
yaml                      0.2.5                h7f98852_2    conda-forge
zipp                      3.12.0             pyhd8ed1ab_0    conda-forge
zlib                      1.2.13               h166bdaf_4    conda-forge
zstd                      1.5.2                h3eb15da_6    conda-forge
paulthebaker commented 1 year ago

I was able to get a testing seg fault on an Ubuntu 22.04 LTS machine, with kernel version 5.15.0-58-generic.

My conda env had a few minor differences from yours, I assume owing to the different kernel version. I am on to comparing my Mac env to mine and your Linux envs.

It's also weird that this doesn't happen on the CI tests, which run on Ubuntu and Mac OS. I'll look for differences between that env and ours as well.

vallis commented 1 year ago

I'm probably saying something obvious, but Enterprise proper, which is all Python, can hardly ever segfault. So this would be one of the C-level Python libraries. It seems the problem does not involve libstempo either (the tempo2 memory management can be scary). But it seems the problem is rather generalized. Numpy?

paulthebaker commented 1 year ago

I think it is a tempo2 call in Pulsar.

As Anne saw, I get somewhat random seg fault locations. The details of the error aren't always clear either. My most recent one did have some details:

tests/test_pta.py::TestPTASignals::test_parameterized_orf Fatal Python error: Fatal Python error: Segmentation fault                                                      

Thread 0x00007f5991fd0740 (most recent call first):                                                                                                                       
  File "/home/ptb/enterprise/enterprise/pulsar.py", line 661 in PulsarSegmentation fault

That line in pulsar.py is:

t2pulsar = t2.tempopulsar(relparfile, reltimfile, maxobs=maxobs, ephem=ephem, clk=clk)
paulthebaker commented 1 year ago

I found some old discussion in the pytest github about tests passing when called individually, but seg-faulting when called in batches: https://github.com/pytest-dev/pytest/issues/3672

aarchiba commented 1 year ago

If libstempo or tempo2 itself have some bug in their memory management - double deallocation, writing past the end, or one of the many other ways to get it wrong in C++ - then this could easily lead to segfaults at varying locations based on memory layout and Python-level garbage collection.

Any of the other compiled libraries could be the culprit, but it seems to me that tempo2/libstempo are the most likely candidates to cause the problem. I'm not sure what hope there is for resolving the issue.

If the problem occurs only on recent versions of Ubuntu, it's possible some of the address space randomization changes are triggering a previously latent bug?

paulthebaker commented 1 year ago

The CI tests run on Ubuntu 22.04.1, and those have been passing. The big difference there is the install chain, which gets suitesparse using apt, builds tempo2 from source, and then installs all of the Python stuff from PyPI. That makes me think that the problem is in the conda-forge build somewhere.

paulthebaker commented 1 year ago

I think the issue is the conda-forge build of tempo2.

I will open an issue there

aarchiba commented 1 year ago

I just followed the procedure you describe above - get suitesparse (and python) from conda, compile tempo2 by hand, and everything else from pip - and I still get the intermittent segfaults.

aarchiba commented 1 year ago

I also used a modified version of https://github.com/ipta/pulsar-env/blob/main/anaconda-env.yml to build a conda environment where the test suite has access to tempo2, and, with all packages straight out of conda, still got segfaults.

vallis commented 1 year ago

Can you see if merely loading the par/tim from the tempo2 executable will segfault? The next thing to check would be to see which call in the tempopulsar constructor triggers the segfault.

I need to get access to ubuntu to try this...

aarchiba commented 1 year ago

The version of enterprise installed on the NANOGrav notebook server segfaults when running its test suite.

Unfortunately, the segfault is not at all immediate; in fact you can run any particular file of tests without (so far) triggering a segfault; it's only when you run the whole test suite that the segfault (usually) occurs (in different places). So I don't think we can expect simply running tempo2 to trigger the problem? Nevertheless I tried running tempo2 on B1855+09_NANOGrav_9yv1.gls.par and the corresponding .tim file and no segfault occurred.

aarchiba commented 1 year ago

Can you suggest how best to figure out which call triggers the segfault? It occurs at different places on each run, and I am not sure how to inspect a core dump from a pytest run to determine the specific call.

aarchiba commented 1 year ago

340 allows one to circumvent this - if you unset TEMPO2, the test suite and Enterprise generally now fall back to PINT and almost everything works (T2 timing model and inflate/deflate are the current exceptions).