wlav / cppyy

Other
384 stars 38 forks source link

Segfault during garbage collection #211

Open sophiehourihane opened 5 months ago

sophiehourihane commented 5 months ago

Apologies for not having a reproducer.

I was wondering what the best practices are for using python_owns and how careful I should be with python's del operator. I am running into a slew of issues when I am testing my code which involves creating and destroying a bunch of cppyy objects. It seems that I am getting segfaults on garbage collection. I have tried manually controlling ownership via setting `python_owns`.

Unfortunately my tests rarely (if ever) fail when they are run alone, but when run with all the other tests they will fail. I think this points to memory issues.

I consistently get segfaults however during garbage collection like this:

Thread 0x000070000d435000 (most recent call first):
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/threading.py", line 324 in wait
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/threading.py", line 607 in wait
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/site-packages/tqdm/_monitor.py", line 60 in run
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/threading.py", line 1016 in _bootstrap_inner
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/threading.py", line 973 in _bootstrap

Current thread 0x00007ff851c12680 (most recent call first):
  Garbage-collecting
  File "/Users/sophie/src/BayesWave_Cpp/bayeswave-cpp/bayeswavecpp_bindings/pythonizations.py", line 84 in to_list
  File "/Users/sophie/src/BayesWave_Cpp/bayeswave-cpp/bayeswavecpp_bindings/pythonizations.py", line 98 in to_complex_array
  File "/Users/sophie/src/BayesWave_Cpp/bayeswave-cpp/bayeswavecpp_bindings/model_collection_posterior.py", line 589 in constructTemplateMatrix
  File "/Users/sophie/src/BayesWave_Cpp/bayeswave-cpp/test/python_binding_tests/test_model_collection_post.py", line 174 in test_construct_template_matrix
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/case.py", line 549 in _callTestMethod
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/case.py", line 591 in run
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/case.py", line 650 in __call__
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/suite.py", line 122 in run
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/suite.py", line 84 in __call__
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/suite.py", line 122 in run
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/suite.py", line 84 in __call__
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/suite.py", line 122 in run
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/suite.py", line 84 in __call__
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/runner.py", line 184 in run
  File "/Applications/CLion.app/Contents/plugins/python-ce/helpers/pycharm/teamcity/unittestpy.py", line 310 in run
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/main.py", line 271 in runTests
  File "/Users/sophie/mambaforge/envs/bwcpp_env_103023/lib/python3.10/unittest/main.py", line 101 in __init__
  File "/Applications/CLion.app/Contents/plugins/python-ce/helpers/pycharm/_jb_unittest_runner.py", line 38 in <module>

Extension modules: libcppyy, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, matplotlib._image, h5py._errors, h5py.defs, h5py._objects, h5py.h5, h5py.h5r, h5py.utils, h5py.h5s, h5py.h5ac, h5py.h5p, h5py.h5t, h5py._conv, h5py.h5z, h5py._proxy, h5py.h5a, h5py.h5d, h5py.h5ds, h5py.h5g, h5py.h5i, h5py.h5f, h5py.h5fd, h5py.h5pl, h5py.h5o, h5py.h5l, h5py._selector, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.hashing, pandas._libs.tslib, pandas._libs.ops, numexpr.interpreter, pandas._libs.arrays, pandas._libs.sparse, pandas._libs.reduction, pandas._libs.indexing, pandas._libs.index, pandas._libs.internals, pandas._libs.join, pandas._libs.writers, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.testing, pandas._libs.parsers, pandas._libs.json, scipy._lib._ccallback_c, matplotlib.backends._macosx, erfa.ufunc, astropy.time._parse_times, _brotli, astropy.table._column_mixins, astropy.table._np_utils, yaml._yaml, astropy.io.ascii.cparser, astropy.utils.xml._iterparser, astropy.io.fits._utils, astropy.io.fits._tiled_compression._compression, astropy.io.votable.tablewriter, _cffi_backend, scipy.signal._sigtools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.sparse.linalg._isolve._iterative, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.linalg._flinalg, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy._lib._uarray._uarray, scipy.signal._max_len_seq_inner, scipy.signal._upfirdn_apply, scipy.signal._spline, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy._lib.messagestream, scipy.optimize._trlib._trlib, numpy.linalg.lapack_lite, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize.__nnls, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.spatial._ckdtree, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.optimize._direct, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.signal._sosfilt, scipy.signal._spectral, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy.stats._statlib, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.signal._peak_finding_utils, scipy.io.matlab._mio_utils, scipy.io.matlab._streams, scipy.io.matlab._mio5_utils, tables._comp_lzo, tables._comp_bzip2, tables.utilsextension, tables.hdf5extension, tables.linkextension, tables.lrucacheextension, tables.tableextension, tables.indexesextension (total: 218)
 *** Break *** segmentation violation
 *** Break *** segmentation violation

Also, if I repeat tests by just having one test over an over again sometimes what will happen is that the type will be completely wrong despite the fact it is getting made in the exact same way. Here is code :


class RunConfigurationGlitchTest(unittest.TestCase, ModelCollectionGlitchTest):
    def setUp(self):
        self.output_directory = self.get_output_directory()
        safe_run_configuration = from_command_line.getRunConfigurationFromOutputDir(self.output_directory)
        #self.safe_run_configuration.__python_owns__ = False
        self.run = Cpp.Configuration.Run.build(safe_run_configuration)
        #self.run.__python_owns__ = False

    def tearDown(self):
        # del self.safe_run_configuration
        # del self.run
        pass

    def test_set_state_from_output_files(self):
        """
        Test that given files that have been written out (given run from runConfigurationTests.test_evolve_run)
        so that you can read in final line and set state successfully
        :return:
        """
        print("test_set_state_from_output_files", flush=True)
        chainCollection = self.run.getChainCollection()
        cold_chain = chainCollection.getChainsInTemperatureOrder()[0].get()
        modelCollection = cold_chain.getModelCollection()
        print('getting state as json string', flush=True)
        print(f"modelCollection is {modelCollection}", flush=True)

And this code usually works, (as in the first output below) but modelCollection will sometimes be a totally wild type instead:


Initializing models from the prior
test_set_state_from_output_files
getting state as json string
modelCollection is <cppyy.gbl.StandardModelCollection object at 0x7fa0d4723490>

Initializing models from the prior
test_set_state_from_output_files
getting state as json string
modelCollection is <C++ overload "find" at 0x1686159c0>

Initializing models from the prior
test_set_state_from_output_files
getting state as json string
modelCollection is <bound method UniformDistribution.uniformDistribution of <bayeswavecpp_bindings.prior_manager.UniformDistribution object at 0x1738624a0>>

I can try to be more specific if need be. I think it is an issue with the way the memory manager works.

wlav commented 5 months ago

Python's memory allocator uses arena's, so PyObjects are re-used. If a pointer is kept somewhere to a Python object that has been recycled, that will appear as such a type (and payload) change.

Just based on the limited code above, it's hard to guess what is what, but is modelCollection perhaps pointing to a member of cold_chain (both on the C++ side)?