heal-research / pyoperon

Python bindings and scikit-learn interface for the Operon library for symbolic regression.
MIT License
34 stars 10 forks source link

operon-sklearn.py example not working - segfault in MinimumDescriptionLengthEvaluator #14

Closed dvokrouhlicky closed 3 months ago

dvokrouhlicky commented 3 months ago

Hello,

After successful installation, running examples that use sklearn results in a segmentation fault. After some debugging I narrowed the error to running the MinimumDescriptionLengthEvaluator on the results to obtain the best model. If I comment out that part in sklearn.py and just obtain the pareto front, or use BIC or AIC, everything works fine. Other examples also work perfectly.

This error has been reproduced on several different machines. Have you ever encoutered it? Thank you very much for any comments, Best, David

foolnotion commented 3 months ago

Hi, I'm having trouble reproducing this crash, could you provide some more details like platform, versions, etc. ? Did you modify the example code?

jose-d commented 3 months ago

Hi, I'm running the HPC cluster where @dvokrouhlicky is trying to do the operon stuff.

Yesterday we tried to reproduce it on my laptop too, so I can provide details:

platform:

Linux 6.5.0-26-generic
Ubuntu 23.10
Python 3.12.2

anaconda:

wget https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh
chmod +x ./Anaconda3-2024.02-1-Linux-x86_64.sh 
./Anaconda3-2024.02-1-Linux-x86_64.sh 
/home/jose/anaconda3/bin/conda init

py(operon install):

git clone https://github.com/heal-research/pyoperon.git
cd pyoperon
# just rename conda env name to pyoperon_44 (current date)
vim environment.yml
conda env create -f ./environment.yml
conda activate pyoperon_44
conda install numpy pandas scikit-learn sympy matplotlib
export CXX=${CONDA_PREFIX}/bin/clang++
export CC=${CONDA_PREFIX}/bin/clang
./script/dependencies.sh
pip install .
cd example/
python ./operon-sklearn.py 
(244, 4) (244,)
Segmentation fault (core dumped)
jose-d commented 3 months ago

..when python faulthandler added into example, we can see:

(pyoperon_44) jose@jose-t14s:~/projects/sklearn_reproducer/pyoperon/example$ python ./operon-sklearn.py 
(244, 4) (244,)
Fatal Python error: Segmentation fault

Current thread 0x0000760edb925740 (most recent call first):
  File "/home/jose/anaconda3/envs/pyoperon_44/lib/python3.12/site-packages/pyoperon/sklearn.py", line 548 in get_solution_stats
  File "/home/jose/anaconda3/envs/pyoperon_44/lib/python3.12/site-packages/pyoperon/sklearn.py", line 557 in fit
  File "/home/jose/projects/sklearn_reproducer/pyoperon/example/./operon-sklearn.py", line 64 in <module>

Extension modules: numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pandas._libs.ops, numexpr.interpreter, bottleneck.move, bottleneck.nonreduce, bottleneck.nonreduce_axis, bottleneck.reduce, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._flinalg, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, sklearn.utils._isfinite, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.utils._logistic_sigmoid, sklearn.utils.sparsefuncs_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_fast, sklearn.utils._random, sklearn.neighbors._partition_nodes, sklearn.neighbors._ball_tree, sklearn.neighbors._kd_tree, sklearn.utils._seq_dataset, sklearn.linear_model._cd_fast, sklearn._loss._loss, sklearn.utils.arrayfuncs, sklearn.svm._liblinear, sklearn.svm._libsvm, sklearn.svm._libsvm_sparse, sklearn.utils._weight_vector, sklearn.linear_model._sgd_fast, sklearn.linear_model._sag_fast, sklearn.decomposition._online_lda_fast, sklearn.decomposition._cdnmf_fast, sklearn.tree._utils, sklearn.neighbors._quad_tree, sklearn.tree._tree, sklearn.tree._splitter, sklearn.tree._criterion, sklearn.ensemble._gradient_boosting, sklearn.ensemble._hist_gradient_boosting.common, sklearn.ensemble._hist_gradient_boosting._gradient_boosting, sklearn.ensemble._hist_gradient_boosting._binning, sklearn.ensemble._hist_gradient_boosting._bitset, sklearn.ensemble._hist_gradient_boosting.histogram, sklearn.ensemble._hist_gradient_boosting._predictor, sklearn.ensemble._hist_gradient_boosting.splitting, sklearn.ensemble._hist_gradient_boosting.utils, matplotlib._c_internal_utils, PIL._imaging, matplotlib._path, kiwisolver._cext, matplotlib._image (total: 214)
Segmentation fault (core dumped)
(pyoperon_44) jose@jose-t14s:~/projects/sklearn_reproducer/pyoperon/example$
foolnotion commented 3 months ago

Hi, thanks for the details. This is a bit weird. On my older ubuntu 22.04 container it worked perfectly:

(pyoperon) ubuntu@pyoperon:~/pyoperon/example$ python
>>> import pyoperon
>>> print(pyoperon.Version())
operon rev. 59ea4c1 Release Linux-6.8.2 x86_64, timestamp 2024-04-03T16:23:49Z
single-precision build using eigen 3.4.0, ceres n/a, taskflow 3.6.0
compiler: Clang 17.0.6, flags: -g -O3 -DNDEBUG -Wall -Wextra -Werror -pedantic -fsized-deallocation -fno-math-errno -march=x86-64-v3
(pyoperon) ubuntu@pyoperon:~/pyoperon/example$ python operon-sklearn.py
(244, 4) (244,)
[-0.647000253200531, 5.0] 725.821044921875 ((-0.4884962439537048) + (1.0900259017944336 * (0.9786118268966675 * X3)))
[-0.8314962387084961, 7.0] 342.47186279296875 ((-1.7255049943923950) + (1.2576309442520142 * ((0.6404727697372437 * X4) + (0.3780705928802490 * X2))))
[-0.8719464540481567, 9.0] 267.5833740234375 (1.3148596286773682 + (0.7510498166084290 * ((((-2.4775171279907227) * X3) + ((-2.3476428985595703) * X2)) * ((-0.0360007286071777) * X4))))
[-0.8830980062484741, 11.0] 258.8580017089844 (1.3141694068908691 + (0.8525248169898987 * ((((2.3025851249694824 * X1) + (5.9684019088745117 * X3)) - ((-6.0123701095581055) * X2)) * (0.0106244552880526 * X4))))
[-0.8861820101737976, 13.0] 262.4435729980469 (0.0886610150337219 + (0.9839779138565063 * (((((-6.2966771125793457) * X4) - ((-2.3311181068420410) * X2)) * (((-0.0063887876458466) * X1) - (0.0131089137867093 * X3))) - ((-0.5330133438110352) * X2))))
[-0.8887240290641785, 15.0] 267.5678405761719 (0.0102210566401482 + (0.9981619119644165 * (((((-6.2966771125793457) * X4) - ((-2.9167242050170898) * X2)) * (((0.0051961382851005 * X4) - (0.0177904125303030 * X3)) - (0.0081524383276701 * X1))) - ((-0.5793023109436035) * X2))))
[-0.8895695209503174, 17.0] 287.78387451171875 (0.6008134484291077 + (0.8920652866363525 * (((1.4154840707778931 * X3) * ((0.0132012134417892 * X1) - ((-0.0634138882160187) * X3))) + ((1.1341078281402588 * X2) - (((0.0669798627495766 * X2) * ((-8.6575593948364258) * X3)) / ((-0.7531581521034241) * X4))))))
...

But on a brand new ubuntu 23.10 container it fails just as you described:

(pyoperon) ubuntu@pyoperon:~/pyoperon/example$ python
Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyoperon
>>> print(pyoperon.Version())
operon rev. 59ea4c1 Release Linux-6.8.3 x86_64, timestamp 2024-04-04T13:21:11Z
single-precision build using eigen 3.4.0, ceres n/a, taskflow 3.6.0
compiler: Clang 18.1.2, flags: -g -O3 -DNDEBUG -Wall -Wextra -Werror -pedantic -fsized-deallocation -fno-math-errno -march=x86-64-v3

>>> 
(pyoperon) ubuntu@pyoperon:~/pyoperon/example$ python operon-sklearn.py 
(244, 4) (244,)
Segmentation fault (core dumped)

Furthermore, updating the environment on the older machine leads to the same failure. You are correct that the segfault happens in the MinimumDescriptionLengthEvaluator. I will investigate and get back with more details.

jose-d commented 3 months ago

hi @foolnotion , thanks for confirming our findings. We tried to narrow source of segfault, but adding debug symbols into operon binary by using

-DCMAKE_C_FLAGS="-g2" \
-DCMAKE_CXX_FLAGS="-g2" \

in script/dependencies.sh somehow hides the root cause, so no progress at our side.

foolnotion commented 3 months ago

Updated to latest version of Operon and the problem seems fixed. Please test dd38d8f308d2ece8c8af44d6afeec3961a46cb4f and see if it works for you.

jose-d commented 3 months ago

nice, seems to be fixed, I confirm that at

>>> import pyoperon
>>> print(pyoperon.Version())
operon rev. d01b92c Release Linux-6.5.0-26-generic x86_64, timestamp 2024-04-05T10:04:40Z
single-precision build using eigen 3.4.0, ceres n/a, taskflow 3.6.0
compiler: Clang 18.1.2, flags: -g -O3 -DNDEBUG -Wall -Wextra -Werror -pedantic -fsized-deallocation -fno-math-errno -march=x86-64-v3

>>>

python operon-sklearn.py works. We'll try it on our HPC ASAP.

( @Python 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:50:58) [GCC 12.3.0] on linux )

dvokrouhlicky commented 3 months ago

Yes, everything works now also on our HPC, thank you for the fast reply and fix

foolnotion commented 3 months ago

thanks for reporting and helping with debug.