yt-project / yt

Main yt repository
http://yt-project.org
Other
460 stars 276 forks source link

BUG: segfault on macOS (amr64) #4953

Closed neutrinoceros closed 1 month ago

neutrinoceros commented 1 month ago

Bug report

For visibility, I think this segfault seen in CI (example logs) is recurrent but not deterministic. It may be an upstream bug and I'm not even sure that it's platform-dependent or Python-version-dependent, so I'm reporting this occurrence and will start paying attention next time around so we can gather some more clues. If it turns out it's really just about Python 3.9 it may not be worth trying to fix it since it might soon be time for us to drop support for it completely.

nastasha-w commented 1 month ago

Yeah, I'm seeing a segfault in that log. I guess it could be some cython code issue, but if it's only showing up on this one mac/python combo, that does kinda smell like some other issue. I'm honestly getting pretty annoyed with code dev on my mac

neutrinoceros commented 1 month ago

I'm honestly getting pretty annoyed with code dev on my mac

Wait do you see this segfault locally too ?

nastasha-w commented 1 month ago

Not personally, but I haven't been able to run anything with openMP (at least through python) on my mac. That's the annoyance I was thinking of. I'm using python 3.11 on my laptop, so at least a 2021 14-inch M1 Mac + python3.11 with no attempt to use multi-threading doesn't seem to have this issue.

nastasha-w commented 1 month ago

I might be getting a similar issue in this test for PR #4939, which fails specifically on python3.12 with macOS.

full copy of the error report in case new tests invalidate the link:

Fatal Python error: Bus error

Thread 0x00000001705df000 (most recent call first):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 359 in wait
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 655 in wait
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/tqdm/_monitor.py", line 60 in run
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/threading.py", line 1030 in _bootstrap

Current thread 0x00000002000e4c00 (most recent call first):
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/lib/_arraysetops_impl.py", line 356 in _unique1d
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/numpy/lib/_arraysetops_impl.py", line 289 in unique
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/unyt/array.py", line 2040 in __array_function__
  File "/Users/runner/work/yt/yt/yt/frontends/stream/tests/test_stream_stretched.py", line 35 in test_variable_dx
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/python.py", line 1627 in runtest
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/runner.py", line 174 in pytest_runtest_call
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/main.py", line 337 in _main
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/main.py", line [330](https://github.com/yt-project/yt/actions/runs/10066586909/job/27828578887?pr=4939#step:7:331) in pytest_cmdline_main
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/Users/runner/hostedtoolcache/Python/3.12.4/arm64/bin/pytest", line 8 in <module>

Extension modules: markupsafe._speedups, yaml._yaml, numpy._core._multiarray_umath, numpy._core._multiarray_tests, numpy.linalg._umath_linalg, cftime._cftime, netCDF4._netCDF4, PIL._imaging, kiwisolver._cext, yt.utilities.lib.allocation_container, yt.utilities.lib.bitarray, yt.geometry.grid_visitors, yt.geometry.oct_container, yt.geometry.oct_visitors, yt.utilities.lib.partitioned_grid, yt.utilities.lib._octree_raytracing, yt.utilities.lib.lenses, yt.utilities.lib.grid_traversal, yt.utilities.lib.image_samplers, yt.utilities.lib.fnv_hash, yt.geometry.selection_routines, yt.utilities.lib.misc_utilities, yt.utilities.lib.particle_mesh_operations, yt.utilities.lib.image_utilities, yt.utilities.lib.quad_tree, yt.geometry.grid_container, yt.utilities.lib.amr_kdtools, yt.utilities.lib.marching_cubes, yt.geometry.particle_deposit, yt.utilities.lib.interpolators, yt.utilities.lib.mesh_utilities, yt.utilities.lib.distance_queue, yt.geometry.particle_smooth, ewah_bool_utils.morton_utils, ewah_bool_utils.ewah_bool_wrap, yt.utilities.lib.geometry_utils, yt.geometry.particle_oct_container, yt.utilities.lib.autogenerated_element_samplers, yt.utilities.lib.element_mappings, yt.utilities.lib.bounded_priority_queue, yt.utilities.lib.cykdtree.kdtree, yt.utilities.lib.particle_kdtree_tools, yt.utilities.lib.pixelization_routines, yt.utilities.lib.cyoctree, yt.utilities.lib.primitives, yt.utilities.lib.bounding_volume_hierarchy, yt.utilities.lib.basic_octree, yt.utilities.lib.contour_finding, yt.utilities.lib.depth_first_octree, yt.utilities.lib.fortran_reader, yt.utilities.lib.line_integral_convolution, yt.utilities.lib.points_in_volume, yt.utilities.lib.write_array, yt.utilities.lib.mesh_triangulation, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, yt.utilities.cython_fortran_utils, yt.utilities.lib.cosmology_time, yt.frontends.ramses.io_utils, yt.utilities.lib.alt_ray_tracers, yt.utilities.lib.ragged_arrays, scipy._lib._ccallback_c, charset_normalizer.md, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, fontTools.misc.bezierTools, fontTools.varLib.iup, PIL._imagingmath, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.utils, h5py.h5t, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5r, h5py._proxy, h5py._conv, h5py.h5z, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, requests.packages.charset_normalizer.md, requests.packages.chardet.md, scipy.interpolate._fitpack, scipy.interpolate._dfitpack, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, astropy.stats._stats, erfa.ufunc, astropy.stats._fast_sigma_clip, astropy.time._parse_times, astropy.table._column_mixins, astropy.table._np_utils, astropy.io.ascii.cparser, astropy.utils.xml._iterparser, astropy.io.fits._utils, astropy.io.fits.hdu.compressed._compression, astropy.io.votable.tablewriter, astropy.wcs._wcs, shapely.lib, shapely._geos, shapely._geometry_helpers, psutil._psutil_osx, psutil._psutil_posix, scipy._lib._uarray._uarray, scipy.fftpack.convolve, fast_histogram._histogram_core, yt.frontends.artio._artio_caller, yt.frontends.gamer.cfields, miniball.bindings (total: 228)
/Users/runner/work/_temp/01d6d7c8-0c85-4930-8b0d-c69202ddbd97.sh: line 1:  5411 Bus error: 10           pytest --color=yes
/Library/Frameworks/Python.framework/Versions/3.12/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '
yt/frontends/stream/tests/test_stream_stretched.py::test_variable_dx 
Error: Process completed with exit code 138.
nastasha-w commented 1 month ago

Huh, I just got another error on a macOS-lastest, python 3.12 test, which occurred during the exact same test (yt/frontends/stream/tests/test_stream_stretched.py::test_variable_dx)! This time it's a segmentation fault instead of a bus error though.

neutrinoceros commented 1 month ago

It also looks like test_variable_dx is the culprit in https://github.com/yt-project/yt/actions/runs/10089899596/job/27898229486 (but I'm still suspecting it could really be triggered in a previous test)

Furthermore, this time it happened on Python 3.10 so it's not 3.9-specific.

neutrinoceros commented 1 month ago

Good news, I actually tried it and I can reproduce this locally with

pytest yt/frontends/stream/tests/test_stream_stretched.py

More notes:

neutrinoceros commented 1 month ago

marking this as a blocker since we basically cannot release until this is adressed

neutrinoceros commented 1 month ago

progress: I bisected it down to https://github.com/numpy/numpy/pull/26821 (which is a backport of https://github.com/numpy/numpy/pull/26797)

neutrinoceros commented 1 month ago

reported upstream as https://github.com/numpy/numpy/issues/27037

neutrinoceros commented 1 month ago

And patched upstream https://github.com/numpy/numpy/pull/27070 The patch was also backported so the next version of numpy (2.0.2 or 2.1.0) will be stable with regard to this test. '