Closed dvokrouhlicky closed 7 months ago
Hi, I'm having trouble reproducing this crash, could you provide some more details like platform, versions, etc. ? Did you modify the example code?
Hi, I'm running the HPC cluster where @dvokrouhlicky is trying to do the operon stuff.
Yesterday we tried to reproduce it on my laptop too, so I can provide details:
platform:
Linux 6.5.0-26-generic
Ubuntu 23.10
Python 3.12.2
anaconda:
wget https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh
chmod +x ./Anaconda3-2024.02-1-Linux-x86_64.sh
./Anaconda3-2024.02-1-Linux-x86_64.sh
/home/jose/anaconda3/bin/conda init
py(operon install):
git clone https://github.com/heal-research/pyoperon.git
cd pyoperon
# just rename conda env name to pyoperon_44 (current date)
vim environment.yml
conda env create -f ./environment.yml
conda activate pyoperon_44
conda install numpy pandas scikit-learn sympy matplotlib
export CXX=${CONDA_PREFIX}/bin/clang++
export CC=${CONDA_PREFIX}/bin/clang
./script/dependencies.sh
pip install .
cd example/
python ./operon-sklearn.py
(244, 4) (244,)
Segmentation fault (core dumped)
..when python faulthandler added into example, we can see:
(pyoperon_44) jose@jose-t14s:~/projects/sklearn_reproducer/pyoperon/example$ python ./operon-sklearn.py
(244, 4) (244,)
Fatal Python error: Segmentation fault
Current thread 0x0000760edb925740 (most recent call first):
File "/home/jose/anaconda3/envs/pyoperon_44/lib/python3.12/site-packages/pyoperon/sklearn.py", line 548 in get_solution_stats
File "/home/jose/anaconda3/envs/pyoperon_44/lib/python3.12/site-packages/pyoperon/sklearn.py", line 557 in fit
File "/home/jose/projects/sklearn_reproducer/pyoperon/example/./operon-sklearn.py", line 64 in <module>
Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, numexpr.interpreter, bottleneck.move, bottleneck.nonreduce, bottleneck.nonreduce_axis, bottleneck.reduce, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_fast, sklearn.utils._random, sklearn.neighbors._partition_nodes, sklearn.neighbors._ball_tree, sklearn.neighbors._kd_tree, sklearn.utils._seq_dataset, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils.arrayfuncs, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, sklearn.decomposition._online_lda_fast, sklearn.decomposition._cdnmf_fast, sklearn.tree._utils, sklearn.neighbors._quad_tree, sklearn.tree._tree, sklearn.tree._splitter, sklearn.tree._criterion, sklearn.ensemble._gradient_boosting, sklearn.ensemble._hist_gradient_boosting.common, sklearn.ensemble._hist_gradient_boosting._gradient_boosting, sklearn.ensemble._hist_gradient_boosting._binning, sklearn.ensemble._hist_gradient_boosting._bitset, sklearn.ensemble._hist_gradient_boosting.histogram, sklearn.ensemble._hist_gradient_boosting._predictor, sklearn.ensemble._hist_gradient_boosting.splitting, sklearn.ensemble._hist_gradient_boosting.utils, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, matplotlib._image (total: 214)
Segmentation fault (core dumped)
(pyoperon_44) jose@jose-t14s:~/projects/sklearn_reproducer/pyoperon/example$
Hi, thanks for the details. This is a bit weird. On my older ubuntu 22.04 container it worked perfectly:
(pyoperon) ubuntu@pyoperon:~/pyoperon/example$ python
>>> import pyoperon
>>> print(pyoperon.Version())
operon rev. 59ea4c1 Release Linux-6.8.2 x86_64, timestamp 2024-04-03T16:23:49Z
single-precision build using eigen 3.4.0, ceres n/a, taskflow 3.6.0
compiler: Clang 17.0.6, flags: -g -O3 -DNDEBUG -Wall -Wextra -Werror -pedantic -fsized-deallocation -fno-math-errno -march=x86-64-v3
(pyoperon) ubuntu@pyoperon:~/pyoperon/example$ python operon-sklearn.py
(244, 4) (244,)
[-0.647000253200531, 5.0] 725.821044921875 ((-0.4884962439537048) + (1.0900259017944336 * (0.9786118268966675 * X3)))
[-0.8314962387084961, 7.0] 342.47186279296875 ((-1.7255049943923950) + (1.2576309442520142 * ((0.6404727697372437 * X4) + (0.3780705928802490 * X2))))
[-0.8719464540481567, 9.0] 267.5833740234375 (1.3148596286773682 + (0.7510498166084290 * ((((-2.4775171279907227) * X3) + ((-2.3476428985595703) * X2)) * ((-0.0360007286071777) * X4))))
[-0.8830980062484741, 11.0] 258.8580017089844 (1.3141694068908691 + (0.8525248169898987 * ((((2.3025851249694824 * X1) + (5.9684019088745117 * X3)) - ((-6.0123701095581055) * X2)) * (0.0106244552880526 * X4))))
[-0.8861820101737976, 13.0] 262.4435729980469 (0.0886610150337219 + (0.9839779138565063 * (((((-6.2966771125793457) * X4) - ((-2.3311181068420410) * X2)) * (((-0.0063887876458466) * X1) - (0.0131089137867093 * X3))) - ((-0.5330133438110352) * X2))))
[-0.8887240290641785, 15.0] 267.5678405761719 (0.0102210566401482 + (0.9981619119644165 * (((((-6.2966771125793457) * X4) - ((-2.9167242050170898) * X2)) * (((0.0051961382851005 * X4) - (0.0177904125303030 * X3)) - (0.0081524383276701 * X1))) - ((-0.5793023109436035) * X2))))
[-0.8895695209503174, 17.0] 287.78387451171875 (0.6008134484291077 + (0.8920652866363525 * (((1.4154840707778931 * X3) * ((0.0132012134417892 * X1) - ((-0.0634138882160187) * X3))) + ((1.1341078281402588 * X2) - (((0.0669798627495766 * X2) * ((-8.6575593948364258) * X3)) / ((-0.7531581521034241) * X4))))))
...
But on a brand new ubuntu 23.10 container it fails just as you described:
(pyoperon) ubuntu@pyoperon:~/pyoperon/example$ python
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyoperon
>>> print(pyoperon.Version())
operon rev. 59ea4c1 Release Linux-6.8.3 x86_64, timestamp 2024-04-04T13:21:11Z
single-precision build using eigen 3.4.0, ceres n/a, taskflow 3.6.0
compiler: Clang 18.1.2, flags: -g -O3 -DNDEBUG -Wall -Wextra -Werror -pedantic -fsized-deallocation -fno-math-errno -march=x86-64-v3
>>>
(pyoperon) ubuntu@pyoperon:~/pyoperon/example$ python operon-sklearn.py
(244, 4) (244,)
Segmentation fault (core dumped)
Furthermore, updating the environment on the older machine leads to the same failure. You are correct that the segfault happens in the MinimumDescriptionLengthEvaluator
. I will investigate and get back with more details.
hi @foolnotion , thanks for confirming our findings. We tried to narrow source of segfault, but adding debug symbols into operon binary by using
-DCMAKE_C_FLAGS="-g2" \
-DCMAKE_CXX_FLAGS="-g2" \
in script/dependencies.sh
somehow hides the root cause, so no progress at our side.
Updated to latest version of Operon and the problem seems fixed. Please test dd38d8f308d2ece8c8af44d6afeec3961a46cb4f and see if it works for you.
nice, seems to be fixed, I confirm that at
>>> import pyoperon
>>> print(pyoperon.Version())
operon rev. d01b92c Release Linux-6.5.0-26-generic x86_64, timestamp 2024-04-05T10:04:40Z
single-precision build using eigen 3.4.0, ceres n/a, taskflow 3.6.0
compiler: Clang 18.1.2, flags: -g -O3 -DNDEBUG -Wall -Wextra -Werror -pedantic -fsized-deallocation -fno-math-errno -march=x86-64-v3
>>>
python operon-sklearn.py
works. We'll try it on our HPC ASAP.
( @Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0] on linux
)
Yes, everything works now also on our HPC, thank you for the fast reply and fix
thanks for reporting and helping with debug.
Hello,
After successful installation, running examples that use sklearn results in a segmentation fault. After some debugging I narrowed the error to running the MinimumDescriptionLengthEvaluator on the results to obtain the best model. If I comment out that part in sklearn.py and just obtain the pareto front, or use BIC or AIC, everything works fine. Other examples also work perfectly.
This error has been reproduced on several different machines. Have you ever encoutered it? Thank you very much for any comments, Best, David