Closed StasEvseev closed 3 months ago
Thanks for reporting this.
The most likely explanation for this is that pydantic v2 is using more memory than v1, and when you exceed the memory available seg faults occur.
Maybe try running checking the memory output right before the segfault, or running fewer workers and see if the seg faults stop.
I don't know of any other reason why pydantic V2 should segfault, if you can give us more detail, we'll investigate immediately.
Hey @samuelcolvin ! Thanks for a quick reply.
We run one experiment on prod to collect more info about segmentaion faults.
We enabled faulthandler
to provide output whenever segfault occurs it output the threads and their traceback.
We captured 3 cases, two of them segfault occured when current threads were holding GIL and run GarbageCollection cycle and one just holding the GIL.
We also got core dumps from the machine, but it doesn't help much, hard to read what is going on in Runtime:
#0 __GI_raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 <signal handler called>
#2 0x00007ba01d5b5a2e in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#3 0x000056b57074d340 in ?? ()
#4 0x00007ba01d87ef28 in _PyRuntime () from /usr/local/bin/../lib/libpython3.11.so.1.0
#5 0x0000000000000001 in ?? ()
#6 0x00007ba01cf7cd60 in ?? ()
#7 0x601a6ea5c71a3b00 in ?? ()
#8 0x00007ba01807a730 in ?? ()
#9 0x00007ba0045e82c0 in ?? ()
#10 0x00007ba01807a6a8 in ?? ()
#11 0x00007ba01d8a7558 in _PyRuntime () from /usr/local/bin/../lib/libpython3.11.so.1.0
#12 0x00007ba01807a728 in ?? ()
#13 0x0000000000000001 in ?? ()
#14 0x00007ba01807a6a8 in ?? ()
#15 0x00007ba01d5c0e87 in _PyEval_EvalFrameDefault () from /usr/local/bin/../lib/libpython3.11.so.1.0
#16 0x00007ba01d5bcf52 in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
#17 0x00007ba01d5db4da in ?? () from /usr/local/bin/../lib/libpython3.11.so.1.0
Do you think it is something you can work with? I can ensure that physical memory usage wasn't even reached the limit we had for the container. Does it answer the question about memory pressure?
Do you run very recursive models? It may not be heap memory but stack overflow.
Does it reproduce if you update to Python 3.11.4? I see no mention of segfault fixes in the 3.11.4 changelog though, so I would guess it won't help.
The fact that the crash is different each time feels a little bit like memory corruption to me. We use very limited unsafe
Rust in pydantic-core
; I'll audit this and also see if a valgrind run yields anything.
Alternatively, is it possible for you to run with debug-instrumented Python and pydantic-core versions so the core dumps are more useful? I can potentially help with configuring a custom pydantic-core build to contain debug info, for Python it depends how your production is deployed.
In https://github.com/pydantic/pydantic-core/pull/922 I've run through the unsafe
which is used in pydantic-core
and either eliminated or justified.
@davidhewitt Thanks for reply!
By running debug-instrumented Python, do you mean run my gunicorn using python3d
binary? Like so python3d -m gunicorn ...
.
If I can get some guidance how to do that, that would be amazing!
Unfortunately issue hardly reproducible. I can try to simulate certain things thought.
python3d
might be overkill because it adds a lot of assertions and I believe there may be some compatibility issues for pydantic-core
anecdotally from other threads (I might try to verify this in CI sometime). Also no prebuilt wheels exist so you'll have to compile all your native dependencies.
It would be a great start if you can download or build your CPython with debug symbols included so the core dumps are much more readable. Potentially you could also build your own pydantic-core
from source with debug symbols included there too.
Hey @davidhewitt !
How can we instrument our Python with debug symbols? Like build it with extra CFLAGS:
-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
Can you also help me with building pydantic-core
with debug symbols included?
Thanks!
@StasEvseev I just built my own 3.12 interpreter from source using just the "optimized" configure options here: https://devguide.python.org/getting-started/setup-building/#optimization
This contained debug information, so it looks like the debug info stripping is probably done by your distro packager.
~/dev/cpython$ ./python --version
Python 3.12.0
~/dev/cpython$ file ./python
./python: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=7487db6f0e6d73eda7cb2dbddb39706d3658e7b3, for GNU/Linux 3.2.0, with debug_info, not stripped
Which linux distribution are you using? There may be an optional package to install python debug info alongside the main executable, as an alternative to building from source. (That said, I could not see one for ubuntu.)
As for pydantic-core
, you just need to have the environment variables CARGO_PROFILE_RELEASE_STRIP=false
and CARGO_PROFILE_RELEASE_DEBUG=limited
(source) set during a build from source. So clone the repo, check out the tag which matches your pydantic version, and run one of the two make
tasks below:
CARGO_PROFILE_RELEASE_STRIP=false CARGO_PROFILE_RELEASE_DEBUG=limited make build-prod
# or if you want fully-optimized
CARGO_PROFILE_RELEASE_STRIP=false CARGO_PROFILE_RELEASE_DEBUG=limited make build-pgo
can see that it contains debug info:
$ python
Python 3.11.4 (main, Jun 9 2023, 07:59:55) [GCC 12.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pydantic_core
>>> pydantic_core._pydantic_core
<module 'pydantic_core._pydantic_core' from '/home/david/dev/pydantic/pydantic-core/python/pydantic_core/_pydantic_core.cpython-311-x86_64-linux-gnu.so'>
>>> exit()
david@david-pc:~/dev/pydantic/pydantic-core$ file /home/david/dev/pydantic/pydantic-core/python/pydantic_core/_pydantic_core.cpython-311-x86_64-linux-gnu.so
/home/david/dev/pydantic/pydantic-core/python/pydantic_core/_pydantic_core.cpython-311-x86_64-linux-gnu.so: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, BuildID[sha1]=728db72c66fe6364fd9694a6ae4df7aac998d434, with debug_info, not stripped
EDIT: added suggestion for CARGO_PROFILE_RELEASE_DEBUG=line-tables-only
tooCARGO_PROFILE_RELEASE_DEBUG=limited
(maturin had an issue with line-tables-only
)
Hey @davidhewitt ! Thank for comprehensive answer!
What we are using is python docker image. I don't see where python build got stripped, but this is what I see on the container:
/usr/local/bin/python3.12: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, BuildID[sha1]=e1466a54058de9be791ef96f61c8e185388684eb, for GNU/Linux 3.2.0, stripped
Link to docker source https://github.com/docker-library/python/blob/b7b91ef359a740a91caeabce414ce4ee70fd2b23/3.11/bookworm/Dockerfile#L44.
I might try to build custom python with your suggested flags.
If I had to guess, the stripping is done as a linker argument via
LDFLAGS="$(dpkg-buildflags --get LDFLAGS)"; \
We also have same or similar problem.
18/Oct/2023 13:12:33.283 ERROR [common.components.base.base:303] ../Objects/dictobject.c:1899: bad argument to internal function
Traceback (most recent call last):
File ".../common/components/base.py", line 299, in _get_data
result[comp.name] = comp.get_data(context_storage=context_storage, data_storage=data_storage)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../common/components/base.py", line 136, in get_data
data = self._get_data(context_storage, data_storage)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../common/components/base.py", line 212, in _get_data
data = component.get_data(context=context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../common/components/base.py", line 80, in get_data
return self._get_data(context=context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../tv_site/components/data/request_context.py", line 102, in _get_data
return self.get_instance_result_model().model_validate(request)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File ".../venv/lib/python3.11/site-packages/pydantic/main.py", line 503, in model_validate
return cls.__pydantic_validator__.validate_python(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
SystemError: ../Objects/dictobject.c:1899: bad argument to internal function
Fatal Python error: Segmentation fault
/* CAUTION: PyDict_SetItem() must guarantee that it won't resize the
* dictionary if it's merely replacing the value for an existing key.
* This means that it's safe to loop over a dictionary with PyDict_Next()
* and occasionally replace a value -- but you can't insert new keys or
* remove them.
*/
int
PyDict_SetItem(PyObject *op, PyObject *key, PyObject *value)
{
if (!PyDict_Check(op)) {
PyErr_BadInternalCall(); // <------ This line
return -1;
}
assert(key);
assert(value);
Py_INCREF(key);
Py_INCREF(value);
return _PyDict_SetItem_Take2((PyDictObject *)op, key, value);
}
(or just seg fault without any usefull message or traceback)
Python 3.11.5
Ubuntu 22.04.3 LTS
pydantic==2.4.2
pydantic_core==2.10.1
gevent==23.9.1
gunicorn==21.2.0
I can reproduce it locally with one worker setup. But unfortunately I can not figure out minimal code example, it just happens from time to time.
Is there any info that could help you? We already started updating our project to v2 and now we are stuck with half of our models being v1 and others - v2.
@bogdandm does the error ever include the native stack trace? That would be extremely helpful to review where the problem is coming from. Alternatively if you are able to get a core dump (e.g. try running with ulimit -c unlimited
) and share relevant parts here that would also greatly help 🙏
@davidhewitt I haven't been able to figure out yet how to get more detailed logs or usual "core dumped" error (until now I believed that it is default behavior, at least in our docker environment). I already tried faulthandler.enable()
but it gives just python traceback, no CPython or Rust code.
But I'll probably try again a little later when I have more time to debug it.
If you have a way to reproduce it locally perhaps we can also discuss a way for me to help debug your code in a confidential environment.
Okay, I can reproduce it within gdb , so there is stack trace
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x0000000000529d7a in PyObject_GetIter ()
(gdb) bt
#0 0x0000000000529d7a in PyObject_GetIter ()
#1 0x000000000053b982 in _PyEval_EvalFrameDefault ()
#2 0x00000000005a8368 in ?? ()
#3 0x00007ffff1cd9908 in pyo3::types::any::{impl#1}::get_item::inner () at /home/bogdan-dm/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/types/any.rs:777
#4 pyo3::types::any::PyAny::get_item<&pyo3::instance::Py<pyo3::types::string::PyString>> () at /home/bogdan-dm/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/types/any.rs:781
#5 pyo3::types::mapping::PyMapping::get_item<&pyo3::instance::Py<pyo3::types::string::PyString>> () at /home/bogdan-dm/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/types/mapping.rs:50
#6 _pydantic_core::lookup_key::LookupKey::py_get_mapping_item () at src/lookup_key.rs:163
#7 0x00007ffff1d47a57 in _pydantic_core::validators::model_fields::{impl#1}::validate::{closure#3}<pyo3::types::any::PyAny> () at src/validators/model_fields.rs:181
#8 _pydantic_core::validators::validation_state::ValidationState::with_new_extra<core::ops::control_flow::ControlFlow<_pydantic_core::errors::line_error::ValError, ()>, _pydantic_core::validators::model_fields::{impl#1}::validate::{closure_env#3}<pyo3::types::any::PyAny>> () at src/validators/validation_state.rs:37
#9 _pydantic_core::validators::model_fields::{impl#1}::validate<pyo3::types::any::PyAny> () at src/validators/model_fields.rs:298
#10 0x00007ffff1d4471c in _pydantic_core::validators::model::ModelValidator::validate_construct<pyo3::types::any::PyAny> () at src/validators/model.rs:277
#11 0x00007ffff1d47afe in _pydantic_core::validators::model_fields::{impl#1}::validate::{closure#3}<pyo3::types::any::PyAny> () at src/validators/model_fields.rs:197
#12 _pydantic_core::validators::validation_state::ValidationState::with_new_extra<core::ops::control_flow::ControlFlow<_pydantic_core::errors::line_error::ValError, ()>, _pydantic_core::validators::model_fields::{impl#1}::validate::{closure_env#3}<pyo3::types::any::PyAny>> () at src/validators/validation_state.rs:37
#13 _pydantic_core::validators::model_fields::{impl#1}::validate<pyo3::types::any::PyAny> () at src/validators/model_fields.rs:298
#14 0x00007ffff1d4471c in _pydantic_core::validators::model::ModelValidator::validate_construct<pyo3::types::any::PyAny> () at src/validators/model.rs:277
#15 0x00007ffff1e17825 in _pydantic_core::validators::SchemaValidator::_validate<pyo3::types::any::PyAny> () at src/validators/mod.rs:338
#16 _pydantic_core::validators::SchemaValidator::validate_python () at src/validators/mod.rs:160
#17 0x00007ffff1e18f9f in _pydantic_core::validators::SchemaValidator::__pymethod_validate_python__ () at src/validators/mod.rs:112
#18 0x00007ffff1c8414c in pyo3::impl_::trampoline::fastcall_with_keywords::{closure#0} () at /home/bogdan-dm/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:41
#19 pyo3::impl_::trampoline::trampoline::{closure#0}<pyo3::impl_::trampoline::fastcall_with_keywords::{closure_env#0}, *mut pyo3_ffi::object::PyObject> ()
at /home/bogdan-dm/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181
#20 std::panicking::try::do_call<pyo3::impl_::trampoline::trampoline::{closure_env#0}<pyo3::impl_::trampoline::fastcall_with_keywords::{closure_env#0}, *mut pyo3_ffi::object::PyObject>, core::result::Result<*mut pyo3_ffi::object::PyObject, pyo3::err::PyErr>> () at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:502
#21 std::panicking::try<core::result::Result<*mut pyo3_ffi::object::PyObject, pyo3::err::PyErr>, pyo3::impl_::trampoline::trampoline::{closure_env#0}<pyo3::impl_::trampoline::fastcall_with_keywords::{closure_env#0}, *mut pyo3_ffi::object::PyObject>> ()
at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panicking.rs:466
#22 std::panic::catch_unwind<pyo3::impl_::trampoline::trampoline::{closure_env#0}<pyo3::impl_::trampoline::fastcall_with_keywords::{closure_env#0}, *mut pyo3_ffi::object::PyObject>, core::result::Result<*mut pyo3_ffi::object::PyObject, pyo3::err::PyErr>> () at /rustc/cc66ad468955717ab92600c770da8c1601a4ff33/library/std/src/panic.rs:142
#23 pyo3::impl_::trampoline::trampoline<pyo3::impl_::trampoline::fastcall_with_keywords::{closure_env#0}, *mut pyo3_ffi::object::PyObject> () at /home/bogdan-dm/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:181
--Type <RET> for more, q to quit, c to continue without paging--c
#24 0x00007ffff1e17f30 in pyo3::impl_::trampoline::fastcall_with_keywords () at /home/bogdan-dm/.cargo/registry/src/index.crates.io-6f17d22bba15001f/pyo3-0.19.2/src/impl_/trampoline.rs:52
#25 _pydantic_core::validators::_::{impl#0}::py_methods::ITEMS::trampoline () at src/validators/mod.rs:112
#26 0x0000000000579007 in ?? ()
#27 0x0000000000547575 in PyObject_Vectorcall ()
#28 0x0000000000539c51 in _PyEval_EvalFrameDefault ()
#29 0x0000000000581d37 in ?? ()
#30 0x0000000000581743 in ?? ()
#31 0x000000000056c931 in PyObject_Call ()
#32 0x000000000053dc14 in _PyEval_EvalFrameDefault ()
#33 0x00000000005624b3 in _PyFunction_Vectorcall ()
#34 0x000000000053dc14 in _PyEval_EvalFrameDefault ()
#35 0x00000000005624b3 in _PyFunction_Vectorcall ()
#36 0x00007ffff6ebd63d in __Pyx_PyObject_Call (kw=0x7fffb75d2900, arg=0x7fffb75483b0, func=0x7fffba0d07c0) at src/gevent/greenlet.c:27114
#37 __pyx_pf_6gevent_17_gevent_cgreenlet_8Greenlet_42run (__pyx_v_self=0x7fffb75b28e0) at src/gevent/greenlet.c:16087
#38 __pyx_pw_6gevent_17_gevent_cgreenlet_8Greenlet_43run (__pyx_v_self=0x7fffb75b28e0, unused=<optimized out>) at src/gevent/greenlet.c:15988
#39 0x00000000005820f7 in ?? ()
#40 0x0000000000581778 in ?? ()
#41 0x00007ffff72f2bf2 in greenlet::UserGreenlet::inner_bootstrap (this=0x7fffb75699b0, origin_greenlet=<optimized out>, run=0x7fffb75d2fc0) at src/greenlet/TUserGreenlet.cpp:460
#42 0x00007ffff72f4c62 in greenlet::UserGreenlet::g_initialstub (this=0x7fffb75699b0, mark=0x7fffffff9388) at src/greenlet/TUserGreenlet.cpp:311
#43 0x00007ffff72f38a5 in greenlet::UserGreenlet::g_switch (this=0x7fffb75699b0) at src/greenlet/TUserGreenlet.cpp:179
#44 0x00000000005820f7 in ?? ()
#45 0x0000000000581705 in ?? ()
#46 0x00007ffff7b9a795 in gevent_call (loop=0x7ffff45097e0, cb=0x7fffb7478b40) at src/gevent/libev/callbacks.c:182
#47 0x00007ffff7bc6860 in __pyx_f_6gevent_5libev_8corecext_4loop__run_callbacks (__pyx_v_self=0x7ffff45097e0) at src/gevent/libev/corecext.c:8593
#48 0x00007ffff7bca75a in gevent_loop_run_callbacks (__pyx_v_loop=__pyx_v_loop@entry=0x7ffff45097e0) at src/gevent/libev/corecext.c:21052
#49 0x00007ffff7b9ab42 in gevent_run_callbacks (_loop=<optimized out>, watcher=0x7ffff45097f8, revents=<optimized out>) at src/gevent/libev/callbacks.c:225
#50 0x00007ffff7b9ac9b in ev_invoke_pending (loop=0x7ffff7bddf00 <default_loop_struct>) at /tmp/build/gevent/deps/libev/ev.c:3770
#51 0x00007ffff7bc7c7b in ev_run (loop=0x7ffff7bddf00 <default_loop_struct>, flags=0) at /tmp/build/gevent/deps/libev/ev.c:4063
#52 0x00007ffff7bc842e in __pyx_pf_6gevent_5libev_8corecext_4loop_14run (__pyx_v_once=<optimized out>, __pyx_v_nowait=<optimized out>, __pyx_v_self=0x7ffff45097e0) at src/gevent/libev/corecext.c:10119
#53 __pyx_pw_6gevent_5libev_8corecext_4loop_15run (__pyx_v_self=0x7ffff45097e0, __pyx_args=<optimized out>, __pyx_nargs=<optimized out>, __pyx_kwds=<optimized out>) at src/gevent/libev/corecext.c:10069
#54 0x0000000000547575 in PyObject_Vectorcall ()
#55 0x0000000000539c51 in _PyEval_EvalFrameDefault ()
#56 0x0000000000581d37 in ?? ()
#57 0x0000000000581778 in ?? ()
#58 0x00007ffff72f2bf2 in greenlet::UserGreenlet::inner_bootstrap (this=0x7ffff49fedf0, origin_greenlet=<optimized out>, run=0x7fffefe84cc0) at src/greenlet/TUserGreenlet.cpp:460
#59 0x00007ffff72f4c62 in greenlet::UserGreenlet::g_initialstub (this=0x7ffff49fedf0, mark=0x7fffffff9b38) at src/greenlet/TUserGreenlet.cpp:311
#60 0x00007ffff72f38a5 in greenlet::UserGreenlet::g_switch (this=0x7ffff49fedf0) at src/greenlet/TUserGreenlet.cpp:179
#61 0x00007fffffff9d30 in ?? ()
#62 0x0000000000000000 in ?? ()
I can try to compile Python with more debug info if you need too. ~But not sure about Rust, I'm not familiar with it at all and lines 3-10 seem to be pretty important.~ Nevermind, command from your message above works out of a box. So I updated stack trace.
Lib versions:
pydantic-core - commit 1a966d55581e1a1379cfe6274da6323c9786aefb
pydantic==2.4.2
gevent==23.9.1 (installed with cython==3.0.2 and `--no-binary :all:` flag)
Python 3.11.5
stable-x86_64-unknown-linux-gnu (default)
rustc 1.73.0 (cc66ad468 2023-10-03)
Operating System: Ubuntu 22.04.3 LTS
Kernel: Linux 6.2.0-35-generic
P.S. This is not gunicorn related crash, I used local django runserver and enabled gevent on server startup (from gevent import monkey; monkey.patch_all()
)
Hmm, so looks like the call to PyObject_GetItem
is crashing, which is quite unexpected. Do you know anything about the model which is being validated when the crash occurs?
That might also imply there is memory corruption earlier in the process. Are you willing to run under valgrind? (I can help figure out an invocation for this.) We should probably also add valgrind to the pydantic-core CI.
Nothing specific. It is actually one super large model that describes whole page on one site. I also suspects some memory corruption, at some point I have weird objects that produces totally random errors. When I started investagating them (obj.dict and other usuall staff) - they had random properties from other objects. I.e. simple lazy translation string (gettext_lazy from django) has _proxy____kw
attribute with some random object from User model. I have not seen this errors in quite a while, so maybe this was some sort of cache corruption.
I can try valgrind, in local environment it is probably safe enough, you can contact me on linkedin (link in github profile)
I was able to run valgrind on the pydantic-core
test suite using a virtual environment on ubuntu with the following command:
valgrind --leak-check=full --track-origins=yes --log-file=valgrind-output.txt python -m pytest
The contents of valgrind-output.txt
suggested a couple memory leaks, which look like globally cached strings, so not of relevant concern here. I'll follow up on those separately another time. Hopefully if you can repeat the same thing but replace python -m pytest
with your command which produces the repro under gdb
, we will identify a cause of your crash. You can share any results with me confidentially over linkedin.
If you're getting a lot of messages, you might want to check if you have /usr/lib/valgrind/python3.supp
present, I understand this is needed due to Python's internal memory allocator.
@bogdandm Thanks for jumping on the issue and help with investigation! For me it a little bit troublesome to reproduce on local environment (due to complex setup). Do you need any help to progress further?
I contacted @ davidhewitt and give him all logs that I was able to collect from my project. So now all hope is that he will be able to figure it all out 🙏🏻
Yep, I'm looking into this at present and hope to have some progress within a few weeks. Will keep posted here.
Just ran into similar issues. M1, Python 3.12.0 and 3.12.1. Pydantic 2.5.2. It only happens with gevent monkey-patched. I also see that we all are using flask.
So I am getting multiple errors, they seem to be pretty random, but it's mostly SIGSEGV/SIGBUS.
I am also running into SystemError: ../Objects/dictobject.c:1899: bad argument to internal function
.
I compiled a debug version of python, and while those errors still happen - a new one started to appear:
Assertion failed: (Py_REFCNT((PyObject*)mp) > 0), function _PyDict_NotifyEvent, file pycore_dict.h, line 169.
Fatal Python error: Aborted
Current thread 0x00000001da509000 (most recent call first):
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/pydantic/main.py", line 503 in model_validate
File "/Users/rafal/Code/redacted/app/orgs/rpc.py", line 176 in rpc_get_members
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/pydantic/_internal/_validate_call.py", line 113 in __call__
File "/Users/rafal/Code/redacted/app/core/openrpc/server.py", line 193 in _execute_method
File "/Users/rafal/Code/redacted/app/core/openrpc/server.py", line 162 in execute_by_data
File "/Users/rafal/Code/redacted/app/core/openrpc/flask.py", line 38 in post
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/flask/views.py", line 190 in dispatch_request
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/flask/views.py", line 115 in view
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/flask/app.py", line 852 in dispatch_request
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/flask/app.py", line 867 in full_dispatch_request
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/flask/app.py", line 1455 in wsgi_app
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/flask/app.py", line 1478 in __call__
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/werkzeug/debug/__init__.py", line 330 in debug_application
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/werkzeug/serving.py", line 325 in execute
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/werkzeug/serving.py", line 362 in run_wsgi
File "/Users/rafal/.pyenv/versions/3.12.1-debug/lib/python3.12/http/server.py", line 424 in handle_one_request
File "/Users/rafal/.pyenv/versions/3.12.1-debug/lib/python3.12/http/server.py", line 436 in handle
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/werkzeug/serving.py", line 390 in handle
File "/Users/rafal/.pyenv/versions/3.12.1-debug/lib/python3.12/socketserver.py", line 761 in __init__
File "/Users/rafal/.pyenv/versions/3.12.1-debug/lib/python3.12/socketserver.py", line 362 in finish_request
File "/Users/rafal/.pyenv/versions/3.12.1-debug/lib/python3.12/socketserver.py", line 692 in process_request_thread
File "/Users/rafal/.pyenv/versions/3.12.1-debug/lib/python3.12/threading.py", line 1010 in run
File "/Users/rafal/.pyenv/versions/3.12.1-debug/lib/python3.12/threading.py", line 1073 in _bootstrap_inner
File "/Users/rafal/.pyenv/versions/3.12.1-debug/lib/python3.12/threading.py", line 1030 in _bootstrap
File "/Users/rafal/Code/redacted/.venv/lib/python3.12/site-packages/gevent/greenlet.py", line 908 in run
Extension modules: _cffi_backend, greenlet._greenlet, markupsafe._speedups (total: 3)
It's a bit weird that faulthandler does not list pydantic core in extensions. Also I'm running this with the following env variables to reduce the amount of c extensions used: export GEVENT_LOOP=libev-cffi PURE_PYTHON=1 DISABLE_SQLALCHEMY_CEXT_RUNTIME=1
I was able to reproduce this error with @rafales's example from #8392. Thanks so much @rafales, that's really helpful.
@davidhewitt and I will do some further digging, specifically:
I think it's very likely this is related to https://github.com/gevent/gevent/issues/1819.
My dumb theory: gevent is switching thread when pydantic-core/pyo3 effectively calls getattr
on the object, meaning code that expects to be single threaded is being called in different threads.
Ok, some progress here: I can isolate the crash to just PyO3 + gevent
, which I've documented in https://github.com/PyO3/pyo3/issues/3668
I will work to figure out next steps from here. We have at least one pathway to a solution (in the new PyO3 API) but maybe there are mitigations we can get across the ecosystem faster.
To follow up with the current state of things: in PyO3 we felt that mitigations are probably impractical from a performance standpoint so we are busy getting the new PyO3 API to a point where it can be used by projects to migrate. This might be a few weeks off still depending on review speed.
any update withe the state of the problem?
We need wait for the new pyo3 API/GIL pool. That's getting pretty close, check the progress in the pyo3 repo.
With the release now done in PyO3 0.21, and pydantic-core updated, I can no longer reproduce the crash on pydantic main
. I will close this issue, hopefully people experiencing problems here can also confirm it's fixed with pydantic main
. We will also release this all soon as Pydantic 2.7!
Initial Checks
Description
Thanks for amazing project! We have been using pydantic for couple of years and it become a standard building block for our codebase.
Everything seems to work, except that once we made a change to v2 version. There has been some problems with a segfaults on production environment.
We haven't figured out a way to reproduce it locally, to provide you more details then just logs from our production environment.
Our setup:
And those are segfaults we are facing on production:
Example Code
No response
Python, Pydantic & OS Version
Selected Assignee: @dmontagu