python / cpython

The Python programming language
https://www.python.org
Other
63.49k stars 30.41k forks source link

Memory leak in Modules/sre_lib.h #67877

Open 179a554f-eaff-458f-ab05-8166251cacbd opened 9 years ago

179a554f-eaff-458f-ab05-8166251cacbd commented 9 years ago
BPO 23689
Nosy @vstinner, @ezio-melotti, @serhiy-storchaka, @animalize
PRs
  • python/cpython#11926
  • python/cpython#12160
  • python/cpython#32188
  • python/cpython#32223
  • python/cpython#32283
  • Files
  • sre_clean_repeat_data.patch
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = 'https://github.com/serhiy-storchaka' closed_at = created_at = labels = ['expert-regex', '3.11', 'performance'] title = 'Memory leak in Modules/sre_lib.h' updated_at = user = 'https://bugs.python.org/abacabadabacaba' ``` bugs.python.org fields: ```python activity = actor = 'serhiy.storchaka' assignee = 'serhiy.storchaka' closed = True closed_date = closer = 'serhiy.storchaka' components = ['Regular Expressions'] creation = creator = 'abacabadabacaba' dependencies = [] files = ['38529'] hgrepos = [] issue_num = 23689 keywords = ['patch'] message_count = 13.0 messages = ['238313', '238320', '238324', '238326', '238328', '238343', '238388', '335889', '337113', '416266', '416269', '416630', '416631'] nosy_count = 7.0 nosy_names = ['vstinner', 'ezio.melotti', 'mrabarnett', 'abacabadabacaba', 'serhiy.storchaka', 'malin', 'alexei.romanov'] pr_nums = ['11926', '12160', '32188', '32223', '32283'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'resource usage' url = 'https://bugs.python.org/issue23689' versions = ['Python 3.11'] ```

    Linked PRs

    179a554f-eaff-458f-ab05-8166251cacbd commented 9 years ago

    In Modules/sre_lib.h on line 882 [1], a block of memory is allocated. If SRE(match) function later terminates abruptly, either because of a signal or because subsequent memory allocation fails, this block is never released.

    [1] https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l882

    vstinner commented 9 years ago

    There is maybe a bug. Can you show an example of regex and a text where the memory leak occurs? You can use the tracemalloc module to check if there is a memory leak. Or use sys.getcounts() if you compiled Python in debug mode.

    sre_lib.h is very complex, it uses the C instruction "goto" with regex bytecodes.

    "DO_JUMP(JUMP_REPEAT, jump_repeat, ctx->pattern+ctx->pattern[0]);" calls "goto entrace" to execute following bytecodes, but later it comes back after DO_JUMP() with the "jump_repeat:" label:

    https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l1180

    179a554f-eaff-458f-ab05-8166251cacbd commented 9 years ago

    Memory leak only happens if match operation terminates abruptly, e.g. because of SIGINT. In this case, DO_JUMP doesn't come back.

    179a554f-eaff-458f-ab05-8166251cacbd commented 9 years ago

    Tracemalloc code:

        import re
        import signal
        import tracemalloc
    
        class AlarmError(Exception):
            pass
        def handle_alarm(signal, frame):
            raise AlarmError
        signal.signal(signal.SIGALRM, handle_alarm)
    
        s1 = tracemalloc.take_snapshot()
        for _ in range(20):
            try:
                signal.alarm(1)
                re.match('(?:a|a|(?=b)){1000}', 'a'*999)
                raise RuntimeError
            except AlarmError:
                pass
        s2 = tracemalloc.take_snapshot()
        res = s2.compare_to(s1, 'lineno')
        for e in res[:10]:
            print(e)

    For me, it shows almost 3 MiB allocated in re.py.

    serhiy-storchaka commented 9 years ago

    May be this patch helps.

    vstinner commented 9 years ago

    Oh cool, you wrote a script to reproduce the issue! And Serhiy wrote a patch, great! Great job guys.

    sre_clean_repeat_data.patch looks good to me.

    @Serhiy: Can you try the example to ensure that it fixes the issue? If yes, go ahead!

    179a554f-eaff-458f-ab05-8166251cacbd commented 9 years ago

    This patch doesn't fix the issue. The problem is that the list starting with state->repeat doesn't necessarily contains all repeat contexts that are allocated. Indeed, here [1] and here [2] repeat contexts are temporarily removed from the list. If the match procedure terminates abruptly, they are not added back.

    [1] https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l963 [2] https://hg.python.org/cpython/file/c89f7c34e356/Modules/sre_lib.h#l1002

    b8e5a65c-bed4-4c5d-ba32-799f883ba638 commented 5 years ago

    Try to allocate SRE_REPEAT on state's stack, the performance has not changed significantly.

    It passes the other tests, except this one (test_stack_overflow): https://github.com/python/cpython/blob/v3.8.0a1/Lib/test/test_re.py#L1225-L1230

    I'll try to fix bpo-35859, bpo-9134 first.

    b8e5a65c-bed4-4c5d-ba32-799f883ba638 commented 5 years ago

    PR11926 (closed) tried to allocate SRE_REPEAT on state's stack. It's feasible, but messes up the code in sre_lib.h, and reduces performance a little (roughly 6% slower), so I gave up this solution.

    PR12160 uses a memory pool, this solution doesn't mess up the code.

    🔸For infrequent alloc/free scenes, it adds a small overhead:

    s = 'a'
    p = re.compile(r'(a)?')
    p.match(s)  # <- measure this statement

    before patch: 316 ns +- 19 ns after patch: 324 ns +- 11 ns, 2.5% slower. (by perf module)

    🔸For very frequent alloc/free scenes, it brings a speedup:

    s = 200_000_000 * 'a'
    p = re.compile(r'.*?(?:bb)+')
    p.match(s)  # <- measure this statement

    before patch: 7.16 sec after patch: 5.82 sec, 18.7% faster. (best of 10 tests)

    🔸I tested in a real case that use 17 patterns to process 100MB data:

    before patch: 27.09 sec after patch: 26.78 sec, 1.1% faster. (best of 4 tests)

    b8e5a65c-bed4-4c5d-ba32-799f883ba638 commented 2 years ago

    My PR methods are suboptimal, so I closed them.

    The number of REPEAT can be counted when compiling a pattern, and allocate a SRE_REPEAT array in SRE_STATE (with that number items).

    It seem at any time, a REPEAT will only have one in active, so a SRE_REPEAT array is fine. regex module does like this: https://github.com/mrabarnett/mrab-regex/blob/hg/regex_3/_regex.c#L18287-L18288

    Can the number of REPEAT be placed in SRE_OP_INFO? And add a field to SRE_OP_REPEAT to indicate the index of this REPEAT.

    serhiy-storchaka commented 2 years ago

    This looks promising. Please, go ahead! You are free to add any fields to any opcodes. It may break some third-party code which generates compiled patterns from a sequence of opcodes, it the stability of this interface was not promised. And they will be broken in any case due to reorganizing of internal code (bpo-47152).

    serhiy-storchaka commented 2 years ago

    New changeset 6e3eee5c11b539e9aab39cff783acf57838c355a by Ma Lin in branch 'main': bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283) https://github.com/python/cpython/commit/6e3eee5c11b539e9aab39cff783acf57838c355a

    serhiy-storchaka commented 2 years ago

    Thank you Ma Lin for all your work.

    The fix changes interfaces of some internal functions which can be used in third-party code, and the bug occurs only in special circumstances, so it is not practical to backport it.

    gpshead commented 2 years ago

    PR https://github.com/python/cpython/pull/93882 reverts the commit that resolved this as it causes a major performance and usability regression: https://github.com/python/cpython/issues/91404.

    serhiy-storchaka commented 20 hours ago

    126840 and #126843 are two alternate PRs. They are based on my earlier patch, but extend it in different ways. #126840 maintains a double-linked list of all repeat contexts. #126843 maintains two single-linked lists of repeat contexts. #126843 looks simpler, but I am not sure about performance.

    Both PRs also add private method _fail_after() in the re.Pattern class which allows to simulate user interruption. After calling _fail_after(n, E) the next matching operation with this pattern will raise exception E after processing n opcodes. This allows stable reproduction of memory leaks.

    Example from https://github.com/python/cpython/issues/91404#issuecomment-1147876699 which could not finish in a hour with previous attempt (#32283), now return the result instantly.

    re.findall(r'(?:(?:/[^\n]*)?\n)*/', '../utils/common":3}],15:[function(t,e,a){"use strict";e.exports=function(){this.input=null,this.next_in=0,this.avail_in=0,this.total_in=0,this.output=null,this.next_out=0,this.avail_out=0,this.total_out=0,this.msg="",this.state=null,this.data_type=2,this.adler=0}},{}],"/":[function(t,e,a){"use strict";var i={};(0,t("./lib/utils/common").assign)(i,t("./lib/deflate"),t("./lib/inflate"),t("./lib/zlib/constants")),e.exports=i},{"./lib/deflate":1,"./lib/inflate":2,"./lib/utils/common":3,"./lib/zlib/constants":6}]},{},[])("/")});\n')

    Example from https://github.com/python/cpython/issues/91404#issuecomment-1148113847 also returns an expected result.

    m = re.match(r'((a)?)*', 'aaaaa')
    assert m.span(0) == (0, 5)
    assert m.span(1) == (5, 5)

    @wjssz, could you please take a look at these PRs?

    wjssz commented 13 hours ago

    #126843 is not feasible. If a SRE_REPEAT struct was not registered in state->repstack, and a signal aborts the matching, then it can't be released.

    Maybe #126843 works. But it requires a deeper understanding of the hiberarchy of SRE_REPEAT and bringing this complexity into the engine code. This may confuse readers of the code.

    126840 should works. But add_repeat()/remove_repeat() functions always malloc/free the struct, this may be inefficient.

    Below is my patch, it uses a doubly-linked list for used pool, a singly-linked list for unused pool. It puts released structs to unused pool, so that these structs can be reused. It's faster when malloc/free structs very frequently.

    My patch compare to #126840 (On Windows, MSVC2022, non-PGO release build):

    ### regex_dna ###
    Mean +- std dev: 139 ms +- 1 ms -> 139 ms +- 1 ms: 1.00x slower
    Not significant
    
    ### regex_effbot ###
    Mean +- std dev: 2.45 ms +- 0.04 ms -> 2.20 ms +- 0.01 ms: 1.11x faster
    Significant (t=47.54)
    
    ### regex_v8 ###
    Mean +- std dev: 21.6 ms +- 0.2 ms -> 22.0 ms +- 0.2 ms: 1.02x slower
    Not significant
    
    My benchmark (it uses 16 patterns to process 100MB real data):
      baseline: 12.104 sec
      #126840:  12.767 sec
      My patch: 12.597 sec 

    Both PRs also add private method _fail_after() in the re.Pattern class which allows to simulate user interruption. After calling _fail_after(n, E) the next matching operation with this pattern will raise exception E after processing n opcodes. This allows stable reproduction of memory leaks.

    IMO _fail_after() method is not necessary. It makes the code complex, for situations where it's almost never used again. The code above triggers the memory leak with signal, maybe it can be included in unit-test? But I don't think it's necessary either.

    This GitHub account can't upload file, you can save the patch, and apply it to main branch:

    click to see my patch ``` Modules/_sre/sre.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++++++ Modules/_sre/sre.h | 6 ++++ Modules/_sre/sre_lib.h | 9 ++---- 3 files changed, 97 insertions(+), 6 deletions(-) diff --git a/Modules/_sre/sre.c b/Modules/_sre/sre.c index 2c86f8869d..aea051c636 100644 --- a/Modules/_sre/sre.c +++ b/Modules/_sre/sre.c @@ -267,6 +267,86 @@ data_stack_grow(SRE_STATE* state, Py_ssize_t size) return 0; } +/* memory pool functions for SRE_REPEAT, this can avoid memory + leak when SRE(match) function terminates abruptly. + state->repeat_pool_used is a doubly-linked list, so that we + can remove a SRE_REPEAT node from it. + state->repeat_pool_unused is a singly-linked list, we put/get + node at the head. */ +static SRE_REPEAT* +repeat_pool_malloc(SRE_STATE *state) +{ + SRE_REPEAT *repeat, *temp; + + if (state->repeat_pool_unused) { + /* remove from unused pool (singly-linked list) */ + repeat = state->repeat_pool_unused; + state->repeat_pool_unused = repeat->pool_next; + } else { + repeat = PyMem_Malloc(sizeof(SRE_REPEAT)); + if (!repeat) { + return NULL; + } + } + + /* add to used pool (doubly-linked list) */ + temp = state->repeat_pool_used; + if (temp) { + temp->pool_prev = repeat; + } + repeat->pool_prev = NULL; + repeat->pool_next = temp; + state->repeat_pool_used = repeat; + + return repeat; +} + +static void +repeat_pool_free(SRE_STATE *state, SRE_REPEAT *repeat) +{ + SRE_REPEAT* const prev = repeat->pool_prev; + SRE_REPEAT* const next = repeat->pool_next; + + /* remove from used pool (doubly-linked list) */ + if (prev) { + prev->pool_next = next; + } else { + state->repeat_pool_used = next; + } + + if (next) { + next->pool_prev = prev; + } + + /* add to unused pool (singly-linked list) */ + repeat->pool_next = state->repeat_pool_unused; + state->repeat_pool_unused = repeat; +} + +static void +repeat_pool_clear(SRE_STATE *state) +{ + SRE_REPEAT *next; + + /* clear used pool */ + next = state->repeat_pool_used; + while (next) { + SRE_REPEAT *temp = next; + next = temp->pool_next; + PyMem_Free(temp); + } + state->repeat_pool_used = NULL; + + /* clear unused pool */ + next = state->repeat_pool_unused; + while (next) { + SRE_REPEAT *temp = next; + next = temp->pool_next; + PyMem_Free(temp); + } + state->repeat_pool_unused = NULL; +} + /* generate 8-bit version */ #define SRE_CHAR Py_UCS1 @@ -502,6 +582,11 @@ state_init(SRE_STATE* state, PatternObject* pattern, PyObject* string, state->must_advance = 0; state->debug = ((pattern->flags & SRE_FLAG_DEBUG) != 0); + /* SRE_REPEAT pool */ + state->repeat = NULL; + state->repeat_pool_used = NULL; + state->repeat_pool_unused = NULL; + state->beginning = ptr; state->start = (void*) ((char*) ptr + start * state->charsize); @@ -533,6 +618,9 @@ state_fini(SRE_STATE* state) /* See above PyMem_Free() for why we explicitly cast here. */ PyMem_Free((void*) state->mark); state->mark = NULL; + + /* SRE_REPEAT pool */ + repeat_pool_clear(state); } /* calculate offset from start of string */ diff --git a/Modules/_sre/sre.h b/Modules/_sre/sre.h index 83d89d57b1..acf73a91e6 100644 --- a/Modules/_sre/sre.h +++ b/Modules/_sre/sre.h @@ -68,6 +68,9 @@ typedef struct SRE_REPEAT_T { const SRE_CODE* pattern; /* points to REPEAT operator arguments */ const void* last_ptr; /* helper to check for infinite loops */ struct SRE_REPEAT_T *prev; /* points to previous repeat context */ + /* for SRE_REPEAT pool */ + struct SRE_REPEAT_T *pool_prev; + struct SRE_REPEAT_T *pool_next; } SRE_REPEAT; typedef struct { @@ -95,6 +98,9 @@ typedef struct { size_t data_stack_base; /* current repeat context */ SRE_REPEAT *repeat; + /* SRE_REPEAT pool */ + SRE_REPEAT *repeat_pool_used; + SRE_REPEAT *repeat_pool_unused; unsigned int sigcount; } SRE_STATE; diff --git a/Modules/_sre/sre_lib.h b/Modules/_sre/sre_lib.h index 97fbb0a75e..1757987f5d 100644 --- a/Modules/_sre/sre_lib.h +++ b/Modules/_sre/sre_lib.h @@ -1120,12 +1120,9 @@ SRE(match)(SRE_STATE* state, const SRE_CODE* pattern, int toplevel) pattern[1], pattern[2])); /* install new repeat context */ - /* TODO(https://github.com/python/cpython/issues/67877): Fix this - * potential memory leak. */ - ctx->u.rep = (SRE_REPEAT*) PyMem_Malloc(sizeof(*ctx->u.rep)); + ctx->u.rep = repeat_pool_malloc(state); if (!ctx->u.rep) { - PyErr_NoMemory(); - RETURN_FAILURE; + RETURN_ERROR(SRE_ERROR_MEMORY); } ctx->u.rep->count = -1; ctx->u.rep->pattern = pattern; @@ -1136,7 +1133,7 @@ SRE(match)(SRE_STATE* state, const SRE_CODE* pattern, int toplevel) state->ptr = ptr; DO_JUMP(JUMP_REPEAT, jump_repeat, pattern+pattern[0]); state->repeat = ctx->u.rep->prev; - PyMem_Free(ctx->u.rep); + repeat_pool_free(state, ctx->u.rep); if (ret) { RETURN_ON_ERROR(ret); ```