python / cpython

The Python programming language
https://www.python.org
Other
63.1k stars 30.22k forks source link

Segfault during deallocation of `_elementtree.XMLParser` #111784

Closed mgorny closed 10 months ago

mgorny commented 11 months ago

Crash report

What happened?

I've originally hit it while running slixmpp's test suite. I've been able to reduce it to the following program:

import asyncio
import xml.etree.ElementTree as ET
from typing import Optional

def set_see_other_host() -> Optional[ET.Element]:
    pass

class XMLStream:
    def disconnect(self):
        asyncio.ensure_future(self._end_stream_wait())

    async def _end_stream_wait(self):
        await asyncio.wait_for(asyncio.Future(), 1)

    def _remove_schedules(self):
        pass

parser = ET.XMLPullParser()

xmpp = XMLStream()
foo = xmpp._remove_schedules
xmpp.disconnect()
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.Queue().join())

I've been able to reproduce this with 3.12.0, and the tips of 3.12 and main branches. I've been able to bisect it to the following commit:

commit 1b5a2b085c28d230c9ef9bd9b472afc85e087ced
Author:     Kumar Aditya <59607654+kumaraditya303@users.noreply.github.com>
AuthorDate: 2023-05-17 01:35:07 +0200
Commit:     GitHub <noreply@github.com>
CommitDate: 2023-05-17 01:35:07 +0200

    GH-103092: isolate `_elementtree` (#104561)

Backtrace:

(gdb) bt
#0  0x00007f9da2d8611c in xmlparser_gc_clear (self=0x7f9da2617760) at ./Modules/_elementtree.c:3784
#1  0x00007f9da2d86857 in xmlparser_dealloc (self=0x7f9da2617760) at ./Modules/_elementtree.c:3809
#2  0x0000559a2edd2145 in _Py_Dealloc (op=0x7f9da2617760) at Objects/object.c:2858
#3  0x0000559a2edae16d in Py_DECREF (op=0x7f9da2617760) at ./Include/object.h:879
#4  Py_XDECREF (op=0x7f9da2617760) at ./Include/object.h:972
#5  0x0000559a2edbbadb in _PyObject_FreeInstanceAttributes (self=0x7f9da3146d20) at Objects/dictobject.c:5646
#6  0x0000559a2ee03a71 in subtype_dealloc (self=0x7f9da3146d20) at Objects/typeobject.c:2064
#7  0x0000559a2edd2145 in _Py_Dealloc (op=0x7f9da3146d20) at Objects/object.c:2858
#8  0x0000559a2edae16d in Py_DECREF (op=0x7f9da3146d20) at ./Include/object.h:879
#9  Py_XDECREF (op=0x7f9da3146d20) at ./Include/object.h:972
#10 0x0000559a2edafbef in free_keys_object (interp=0x559a2f2a73a8 <_PyRuntime+76872>, keys=0x7f9da26c9ac0) at Objects/dictobject.c:675
#11 0x0000559a2edaef84 in dictkeys_decref (interp=0x559a2f2a73a8 <_PyRuntime+76872>, dk=0x7f9da26c9ac0) at Objects/dictobject.c:335
#12 0x0000559a2edb31e2 in PyDict_Clear (op=0x7f9da322ba00) at Objects/dictobject.c:2120
#13 0x0000559a2edb7a1e in dict_tp_clear (op=0x7f9da322ba00) at Objects/dictobject.c:3588
#14 0x0000559a2efb69ec in delete_garbage (tstate=0x559a2f30ce28 <_PyRuntime+493256>, gcstate=0x559a2f2a7798 <_PyRuntime+77880>, 
    collectable=0x7ffffccdee70, old=0x559a2f2a77e0 <_PyRuntime+77952>) at Modules/gcmodule.c:1033
#15 0x0000559a2efb707c in gc_collect_main (tstate=0x559a2f30ce28 <_PyRuntime+493256>, generation=2, n_collected=0x0, 
    n_uncollectable=0x0, nofail=1) at Modules/gcmodule.c:1313
#16 0x0000559a2efb8a1e in _PyGC_CollectNoFail (tstate=0x559a2f30ce28 <_PyRuntime+493256>) at Modules/gcmodule.c:2154
#17 0x0000559a2ef74d80 in finalize_modules (tstate=0x559a2f30ce28 <_PyRuntime+493256>) at Python/pylifecycle.c:1677
#18 0x0000559a2ef751cb in Py_FinalizeEx () at Python/pylifecycle.c:1931
#19 0x0000559a2efb4ced in Py_RunMain () at Modules/main.c:709
#20 0x0000559a2efb4d9f in pymain_main (args=0x7ffffccdefd0) at Modules/main.c:737
#21 0x0000559a2efb4e5f in Py_BytesMain (argc=2, argv=0x7ffffccdf138) at Modules/main.c:761
#22 0x0000559a2eccc885 in main (argc=2, argv=0x7ffffccdf138) at ./Programs/python.c:15

CC @kumaraditya303

CPython versions tested on:

3.12, CPython main branch

Operating systems tested on:

Linux

Output from running 'python -VV' on the command line:

Python 3.13.0a1+ (heads/main:ba8aa1fd37, Nov 6 2023, 16:26:31) [GCC 13.2.1 20231014]

Linked PRs

mdboom commented 11 months ago

Cc: @kumaraditya303

chgnrdv commented 11 months ago

Reduced example, just in case:

import _asyncio
import _elementtree

async def _end_stream_wait():
    await "foo"

parser = _elementtree.XMLParser()
_asyncio.Task(_end_stream_wait())
Eclips4 commented 11 months ago

The problem is more complicated than it may seem at first glance. First of all, problem is unrelated to isolation of _elementtree. In fact, here's the first bad commit - b652d40f1c.

What's actually happening and why this code segfaults?

  1. XMLParser is trying to deallocate himself:

https://github.com/python/cpython/blob/931f4438c92ec0eb2aa894092a91749f1d5bd216/Modules/_elementtree.c#L3777-L3785 Line 3784 (which actually means (st->expat_capi->ParserFree)(parser)) leads to segfault if the second point (which described below) already executed.

  1. pyexpat_capsule trying to deallocate himself (st->expat_capi of _elementtree is pointing at this data!)

Where's the problem? Order of execution of two points described above is not strict. Second point can be executed before first ponit and vice-versa (which is okay for us). Situation when pyexpat_capsule deallocated before deallocation of XMLParser leads to segmentation fault.

cc @vstinner

vstinner commented 11 months ago

_elementtree extension should make sure that the pyexpat extension remains usable until it is done using it.

struct PyExpat_CAPI *expat_capi; of elementtreestate is just a pointer, it doesn't impact lifetime of the capsule memory or the pyexpat extension.

Eclips4 commented 11 months ago

_elementtree extension should make sure that the pyexpat extension remains usable until it is done using it.

struct PyExpat_CAPI *expat_capi; of elementtreestate is just a pointer, it doesn't impact lifetime of the capsule memory or the pyexpat extension.

Yes, that is what we should do. But I don't understand how we can make sure that the pyexpat extension remains usable until _elementtree is done using it.

vstinner commented 11 months ago

Can we keep a strong reference to the pyexpat module? Is it enough?

Eclips4 commented 11 months ago

Can we keep a strong reference to the pyexat module? Is it enough?

There's no such API, and yes, it will be enough. Imported capsule doesn't store any information about module which is bounded to.

Eclips4 commented 11 months ago

To be honest, I don't see any solution without changing some code of capsule's implementation..

vstinner commented 11 months ago

To be honest, I don't see any solution without changing some code of capsule's implementation..

That's acceptable and has been done recently: see commit 513c89d0122ff6245cfefbb49ef32bde8a2336f5.

Eclips4 commented 11 months ago

To be honest, I don't see any solution without changing some code of capsule's implementation..

That's acceptable and has been done recently: see commit 513c89d.

I've write a draft PR, please take a look at the #112053.

Eclips4 commented 10 months ago

That's a kinda hard issue :smile: First of all, there's a two segfaults, first is described above (https://github.com/python/cpython/issues/111784#issuecomment-1798311825) Then, let's talk about second segfault. It's funny, but code segfaults at the same place! I mean EXPAT(st, ParserFree)(parser).

Why does it happens? 1._elementtree module is trying to deallocate, so it's call module_dealloc in moduleobject.c

There's a one interesting thing:

    if (m->md_state != NULL)
        PyMem_Free(m->md_state);

It's "clears" the state of module, if it's isn't freed previously. That's incorrect, because there is no guarantee that module doesn't use state to do some cleanup things in his deallocators. Ok, state is cleared, let's move on.

  1. After that, elementtree_clear is called (or elementtree_free, maybe it doesnt matter because elementtree_free calls elementtree_clear)
  2. The paragraph above executes the Py_CLEAR(st->XMLParser_Type);
  3. During the execution of previous step, it's calls the xmlparser_dealloc which calls the xmlparser_gc_clear.
  4. Finally, xmlparser_gc_clear calls the EXPAT(st, ParserFree)(parser) (reminder, it's interaction with the state). But... state is already cleaned (now it's points to 0xDDDDDDDDDDDDDDDD which mean that this pointer already freed) in first step! So, there's a segfault.

I'll take an opportunity to add here release-blocker label, because it's can happen in real user code. Do not forget, this segfault is present in 3.12, so in my opinion we should fix it until the next version of 3.12 is released.

erlend-aasland commented 10 months ago

That's really strange. IIRC, an extension module should never need to free it's allocated state explicitly. That sounds like a serious bug to me; module state should be freed by the runtime. (I haven't looked at the code yet.) cc. @vstinner @encukou

Eclips4 commented 10 months ago

I wrote a PR, please check #113405

Eclips4 commented 10 months ago

That's really strange. IIRC, an extension module should never need to free it's allocated state explicitly. That sounds like a serious bug to me; module state should be freed by the runtime. (I haven't looked at the code yet.) cc. @vstinner @encukou

Seems you misunderstand me. Extension module doesn't tries to free his state. module_dealloc (I mean tp_dealloc slot of module object which located in moduleobject.c) called and frees the state. Why is it called - seems that there no references to module any longer, and this deallocate the state, which wrong, because some instances of XMLParser need this state.

Eclips4 commented 10 months ago

Thanks @mgorny for the report!

mgorny commented 10 months ago

Thanks a lot!