monarch-initiative / curategpt

LLM-driven curation assist tool
https://monarch-initiative.github.io/curategpt/
BSD 3-Clause "New" or "Revised" License
71 stars 12 forks source link

`test_duckdb_adapter` is segfaulting #84

Closed caufieldjh closed 2 months ago

caufieldjh commented 2 months ago

test_duckdb_adapter encounters segmentation fault after updating dependencies. Stack trace from the GH action:

py: freeze> python -m pip freeze --all
py: pip==24.2,setuptools==74.1.2,wheel==0.44.0
py: commands[0]> poetry run pytest
============================= test session starts ==============================
platform linux -- Python 3.11.[9](https://github.com/monarch-initiative/curate-gpt/actions/runs/10818756682/job/30015095460?pr=83#step:8:10), pytest-8.3.3, pluggy-1.5.0
cachedir: .tox/py/.pytest_cache
rootdir: /home/runner/work/curate-gpt/curate-gpt
configfile: pyproject.toml
plugins: anyio-4.4.0
collected 157 items

tests/agents/test_chat.py ss                                             [  1%]
tests/agents/test_concept_recognizer.py ssssssssssss                     [  8%]
tests/agents/test_dase.py sss                                            [ [10](https://github.com/monarch-initiative/curate-gpt/actions/runs/10818756682/job/30015095460?pr=83#step:8:11)%]
tests/agents/test_dragon.py sssss                                        [ 14%]
tests/agents/test_mapper.py ssssssssssssssssssssssssssssssssssssssss     [ 39%]
tests/cli/test_chat_cli.py s                                             [ 40%]
tests/cli/test_cli.py .                                                  [ 40%]
tests/cli/test_store_cli.py .                                            [ 41%]
tests/evaluation/test_calculate_statistics.py ..........                 [ 47%]
tests/evaluation/test_runner.py sss                                      [ 49%]
tests/extract/test_extractor.py ssssss......                             [ 57%]
tests/store/test_chromadb_adapter.py ...s.........                       [ 65%]
Fatal Python error: Segmentation fault

Thread 0x00007f99d7fff640 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.[11](https://github.com/monarch-initiative/curate-gpt/actions/runs/10818756682/job/30015095460?pr=83#step:8:12).9/x64/lib/python3.11/threading.py", line 331 in wait
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 629 in wait
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f99d6ffe640 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 331 in wait
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 629 in wait
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f99de374640 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 331 in wait
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 629 in wait
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/tqdm/_monitor.py", line 60 in run
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f99df575640 (most recent call first):
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 331 in wait
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/queue.py", line 180 in get
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/posthog/consumer.py", line 107 in next
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/posthog/consumer.py", line 76 in upload
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/posthog/consumer.py", line 65 in run
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 1045 in _bootstrap_inner
  File "/opt/hostedtoolcache/Python/3.11.9/x64/lib/python3.11/threading.py", line 1002 in _bootstrap

Thread 0x00007f9ad8a3fb80 (most recent call first):
  File "/home/runner/work/curate-gpt/curate-gpt/src/curate_gpt/store/duckdb_adapter.py", line 181 in create_index
  File "/home/runner/work/curate-gpt/curate-gpt/src/curate_gpt/store/duckdb_adapter.py", line 353 in _process_objects
  File "/home/runner/work/curate-gpt/curate-gpt/src/curate_gpt/store/duckdb_adapter.py", line 230 in insert
  File "/home/runner/work/curate-gpt/curate-gpt/src/curate_gpt/store/duckdb_adapter.py", line 247 in update
  File "/home/runner/work/curate-gpt/curate-gpt/tests/store/test_duckdb_adapter.py", line 97 in test_store_variations
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/python.py", line 159 in pytest_pyfunc_call
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_manager.py", line [12](https://github.com/monarch-initiative/curate-gpt/actions/runs/10818756682/job/30015095460?pr=83#step:8:13)0 in _hookexec
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 5[13](https://github.com/monarch-initiative/curate-gpt/actions/runs/10818756682/job/30015095460?pr=83#step:8:14) in __call__
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/python.py", line [16](https://github.com/monarch-initiative/curate-gpt/actions/runs/10818756682/job/30015095460?pr=83#step:8:17)27 in runtest
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/runner.py", line [17](https://github.com/monarch-initiative/curate-gpt/actions/runs/10818756682/job/30015095460?pr=83#step:8:18)4 in pytest_runtest_call
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/runner.py", line 242 in <lambda>
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/runner.py", line 341 in from_call
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/runner.py", line 241 in call_and_report
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/runner.py", line 132 in runtestprotocol
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/runner.py", line 113 in pytest_runtest_protocol
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/main.py", line 362 in pytest_runtestloop
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/main.py", line 337 in _main
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/main.py", line 283 in wrap_session
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/main.py", line 330 in pytest_cmdline_main
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_callers.py", line 103 in _multicall
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_manager.py", line 120 in _hookexec
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/pluggy/_hooks.py", line 513 in __call__
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 175 in main
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/lib/python3.11/site-packages/_pytest/config/__init__.py", line 201 in console_main
  File "/home/runner/.cache/pypoetry/virtualenvs/curate-gpt-GkqvqKm0-py3.11/bin/pytest", line 8 in <module>

Extension modules: yaml._yaml, regex._regex, numpy.core._multiarray_umath, numpy.core._multiarray_tests, numpy.linalg._umath_linalg, numpy.fft._pocketfft_internal, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt[19](https://github.com/monarch-initiative/curate-gpt/actions/runs/10818756682/job/30015095460?pr=83#step:8:20)937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, pyarrow.lib, pyarrow._hdfsio, pandas._libs.tslibs.ccalendar, pandas._libs.tslibs.np_datetime, pandas._libs.tslibs.dtypes, pandas._libs.tslibs.base, pandas._libs.tslibs.nattype, pandas._libs.tslibs.timezones, pandas._libs.tslibs.fields, pandas._libs.tslibs.timedeltas, pandas._libs.tslibs.tzconversion, pandas._libs.tslibs.timestamps, pandas._libs.properties, pandas._libs.tslibs.offsets, pandas._libs.tslibs.strptime, pandas._libs.tslibs.parsing, pandas._libs.tslibs.conversion, pandas._libs.tslibs.period, pandas._libs.tslibs.vectorized, pandas._libs.ops_dispatch, pandas._libs.missing, pandas._libs.hashtable, pandas._libs.algos, pandas._libs.interval, pandas._libs.lib, pyarrow._compute, pandas._libs.ops, pandas._libs.hashing, pandas._libs.arrays, pandas._libs.tslib, pandas._libs.sparse, pandas._libs.internals, pandas._libs.indexing, pandas._libs.index, pandas._libs.writers, pandas._libs.join, pandas._libs.window.aggregations, pandas._libs.window.indexers, pandas._libs.reshape, pandas._libs.groupby, pandas._libs.json, pandas._libs.parsers, pandas._libs.testing, charset_normalizer.md, tornado.speedups, psutil._psutil_linux, psutil._psutil_posix, grpc._cython.cygrpc, bson._cbson, pymongo._cmessage, _cffi_backend, sqlalchemy.cyextension.collections, sqlalchemy.cyextension.immutabledict, sqlalchemy.cyextension.processors, sqlalchemy.cyextension.resultproxy, sqlalchemy.cyextension.util, greenlet._greenlet, ijson.backends._yajl2, lxml._elementpath, lxml.etree, torch._C, torch._C._fft, torch._C._linalg, torch._C._nested, torch._C._nn, torch._C._sparse, torch._C._special, sklearn.__check_build._check_build, scipy._lib._ccallback_c, scipy.sparse._sparsetools, _csparsetools, scipy.sparse._csparsetools, scipy.linalg._fblas, scipy.linalg._flapack, scipy.linalg.cython_lapack, scipy.linalg._cythonized_array_utils, scipy.linalg._solve_toeplitz, scipy.linalg._decomp_lu_cython, scipy.linalg._matfuncs_sqrtm_triu, scipy.linalg.cython_blas, scipy.linalg._matfuncs_expm, scipy.linalg._decomp_update, scipy.sparse.linalg._dsolve._superlu, scipy.sparse.linalg._eigen.arpack._arpack, scipy.sparse.linalg._propack._spropack, scipy.sparse.linalg._propack._dpropack, scipy.sparse.linalg._propack._cpropack, scipy.sparse.linalg._propack._zpropack, scipy.sparse.csgraph._tools, scipy.sparse.csgraph._shortest_path, scipy.sparse.csgraph._traversal, scipy.sparse.csgraph._min_spanning_tree, scipy.sparse.csgraph._flow, scipy.sparse.csgraph._matching, scipy.sparse.csgraph._reordering, scipy.special._ufuncs_cxx, scipy.special._cdflib, scipy.special._ufuncs, scipy.special._specfun, scipy.special._comb, scipy.special._ellip_harm_2, scipy.spatial._ckdtree, scipy._lib.messagestream, scipy.spatial._qhull, scipy.spatial._voronoi, scipy.spatial._distance_wrap, scipy.spatial._hausdorff, scipy.spatial.transform._rotation, scipy.ndimage._nd_image, _ni_label, scipy.ndimage._ni_label, scipy.optimize._minpack2, scipy.optimize._group_columns, scipy.optimize._trlib._trlib, scipy.optimize._lbfgsb, _moduleTNC, scipy.optimize._moduleTNC, scipy.optimize._cobyla, scipy.optimize._slsqp, scipy.optimize._minpack, scipy.optimize._lsq.givens_elimination, scipy.optimize._zeros, scipy.optimize._highs.cython.src._highs_wrapper, scipy.optimize._highs._highs_wrapper, scipy.optimize._highs.cython.src._highs_constants, scipy.optimize._highs._highs_constants, scipy.linalg._interpolative, scipy.optimize._bglu_dense, scipy.optimize._lsap, scipy.optimize._direct, scipy.integrate._odepack, scipy.integrate._quadpack, scipy.integrate._vode, scipy.integrate._dop, scipy.integrate._lsoda, scipy.special.cython_special, scipy.stats._stats, scipy.stats.beta_ufunc, scipy.stats._boost.beta_ufunc, scipy.stats.binom_ufunc, scipy.stats._boost.binom_ufunc, scipy.stats.nbinom_ufunc, scipy.stats._boost.nbinom_ufunc, scipy.stats.hypergeom_ufunc, scipy.stats._boost.hypergeom_ufunc, scipy.stats.ncf_ufunc, scipy.stats._boost.ncf_ufunc, scipy.stats.ncx2_ufunc, scipy.stats._boost.ncx2_ufunc, scipy.stats.nct_ufunc, scipy.stats._boost.nct_ufunc, scipy.stats.skewnorm_ufunc, scipy.stats._boost.skewnorm_ufunc, scipy.stats.invgauss_ufunc, scipy.stats._boost.invgauss_ufunc, scipy.interpolate._fitpack, scipy.interpolate.dfitpack, scipy.interpolate._bspl, scipy.interpolate._ppoly, scipy.interpolate.interpnd, scipy.interpolate._rbfinterp_pythran, scipy.interpolate._rgi_cython, scipy.stats._biasedurn, scipy.stats._levy_stable.levyst, scipy.stats._stats_pythran, scipy._lib._uarray._uarray, scipy.stats._ansari_swilk_statistics, scipy.stats._sobol, scipy.stats._qmc_cy, scipy.stats._mvn, scipy.stats._rcont.rcont, scipy.stats._unuran.unuran_wrapper, sklearn.utils._isfinite, sklearn.utils.sparsefuncs_fast, sklearn.utils.murmurhash, sklearn.utils._openmp_helpers, sklearn.metrics.cluster._expected_mutual_info_fast, sklearn.preprocessing._csr_polynomial_expansion, sklearn.preprocessing._target_encoder_fast, sklearn.metrics._dist_metrics, sklearn.metrics._pairwise_distances_reduction._datasets_pair, sklearn.utils._cython_blas, sklearn.metrics._pairwise_distances_reduction._base, sklearn.metrics._pairwise_distances_reduction._middle_term_computer, sklearn.utils._heap, sklearn.utils._sorting, sklearn.metrics._pairwise_distances_reduction._argkmin, sklearn.metrics._pairwise_distances_reduction._argkmin_classmode, sklearn.utils._vector_sentinel, sklearn.metrics._pairwise_distances_reduction._radius_neighbors, sklearn.metrics._pairwise_distances_reduction._radius_neighbors_classmode, sklearn.metrics._pairwise_fast, PIL._imaging, markupsafe._speedups (total: [20](https://github.com/monarch-initiative/curate-gpt/actions/runs/10818756682/job/30015095460?pr=83#step:8:21)9)
tests/store/test_duckdb_adapter.py spy: exit -11 (34.72 seconds) /home/runner/work/curate-gpt/curate-gpt> poetry run pytest pid=2076
  py: FAIL code -11 (35.43=setup[0.71]+cmd[34.72] seconds)
  evaluation failed :( (35.49 seconds)

The poetry update bumped duckdb to 1.1.0, so that may be the issue, and may involve mismatch with a test fixture.

caufieldjh commented 2 months ago

Error log suggests the issue is in test_store_variations(), but that shouldn't be running without an openai key anyway, so I guess it's something higher level.

caufieldjh commented 2 months ago

This may be related: https://github.com/duckdb/duckdb/issues/13834 They say a fix is coming soon. For now, the workaround is to use duckdb 1.0.0 instead.