python / cpython

The Python programming language
https://www.python.org
Other
63.47k stars 30.39k forks source link

Add mimalloc memory allocator #90815

Closed tiran closed 10 months ago

tiran commented 2 years ago
BPO 46657
Nosy @nascheme, @tiran, @corona10, @erlend-aasland, @h-vetinari
PRs
  • python/cpython#31164
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = None created_at = labels = ['interpreter-core', 'type-feature', '3.11'] title = 'Add mimalloc memory allocator' updated_at = user = 'https://github.com/tiran' ``` bugs.python.org fields: ```python activity = actor = 'h-vetinari' assignee = 'none' closed = False closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'christian.heimes' dependencies = [] files = [] hgrepos = [] issue_num = 46657 keywords = ['patch'] message_count = 12.0 messages = ['412639', '412641', '412645', '412654', '412669', '412679', '412741', '412749', '412796', '412835', '412867', '412986'] nosy_count = 5.0 nosy_names = ['nascheme', 'christian.heimes', 'corona10', 'erlendaasland', 'h-vetinari'] pr_nums = ['31164'] priority = 'normal' resolution = None stage = 'patch review' status = 'open' superseder = None type = 'enhancement' url = 'https://bugs.python.org/issue46657' versions = ['Python 3.11'] ```

    Linked PRs

    tiran commented 2 years ago

    From https://github.com/microsoft/mimalloc

    mimalloc (pronounced "me-malloc") is a general purpose allocator with excellent performance characteristics. Initially developed by Daan Leijen for the run-time systems of the Koka and Lean languages.

    mimalloc has several interesting properties that make it useful for CPython. Amongst other it is fast, thread-safe, and NUMA-aware. It has built-in free lists with multi-sharding and allocation heaps. While Python's obmalloc requires the GIL to protect its data structures, mimalloc uses mostly thread-local and atomic instructions (compare-and-swap) for efficiency. Sam Gross' nogil relies on mimalloc's thread safety and uses first-class heaps for heap walking GC.

    mimalloc works on majority of platforms and CPU architectures. However it requires a compiler with C11 atomics support. CentOS 7's default GCC is slightly too old, more recent GCC from Developer Toolset is required.

    For 3.11 I plan to integrate mimalloc as an optional drop-in replacement for obmalloc. Users will be able to compile CPython without mimalloc or disable mimalloc with PYTHONMALLOC env var. Since mimalloc will be optional in 3.11, Python won't depend or expose on any of the advanced features yet. The approach enables the community to test and give feedback with minimal risk of breakage.

    mimalloc sources will vendored without any option to use system libraries. Python's mimalloc requires several non-standard compile-time flags. In the future Python may extend or modify mimalloc for heap walking and nogil, too.

    (This is a tracking bug until I find time to finish a PEP.)

    corona10 commented 2 years ago

    I add Neil to the nosy list since he is one of the kick-off members with this amazing works :)

    tiran commented 2 years ago

    New features:

    tiran commented 2 years ago

    Buildbots "PPC64 Fedora PR" and all RHEL 7 build bots provided by David Edelsohn are failing because compiler is missing support for stdatomic.h.

    nascheme commented 2 years ago

    Thanks, I'm indeed interested. Most credit goes to Christian for advancing this.

    For the missing stdatomic.h, would it be appropriate to have an autoconfig check for it? Can just disable mimalloc if it doesn't exist.

    tiran commented 2 years ago

    We have an autoconf check for stdatomic.h. The test even verifies that a program with atomic_load_explicit() compiles and links.

    How do we want to use mimalloc in the future? Is it going to stay optional in 3.12? Then the default setting for --with-mimalloc should depend on presence of stdatomic.h. Do we want to make it mandatory for GC heap walking and nogil? Then --with-mimalloc should default to "yes" and configure should abort when stdatomic.h is missing.

    I'm leaning towards --with-mimalloc=yes. It will make users aware that they need a compiler with atomics:

    configure: error: --with-mimalloc requires stdatomic.h. Update your compiler or rebuild with --without-mimalloc. Python 3.12 will require stdatomic.

    tiran commented 2 years ago

    ICC might be a problem. Apparently some version have an incomplete stdatomic.h, see bpo-37415.

    tiran commented 2 years ago

    References:

    nascheme commented 2 years ago

    My preference would be for --with-mimalloc=yes in an upcoming release. For platforms without the required stdatomic.h stuff, they can manually specify --with-mimalloc=no. That will make them aware that a future release of Python might no longer build (if mimalloc is no longer optional).

    A soft-landing for merging nogil is not a good enough reason to merge mimalloc, IMHO. nogil may never be merged. There should be some concrete and immediate advantage to switch to mimalloc. The idea of using the "heap walking" to improve is cyclic GC is not concrete enough. It's just an idea at this point.

    I think the (small) performance win could be enough of a reason to merge. This seems to be the most recent benchmark:

    https://gist.github.com/pablogsal/8027937b71cd30f17aaaa5ef7c885d3e

    There is also the long-term maintenance issue. So far, mimalloc upstream has been responsive. The mimalloc code is not so huge or complicated that we couldn't maintain it (if for some reason it gets abandoned upstream). However, I think we would prefer to maintain obmalloc rather than mimalloc, all else being equal. Abandonment by the upstream seems fairly unlikely. So, I'm not too concerned about maintenance.

    tiran commented 2 years ago

    New benchmark:

    Benchmark 2022-02-08_11-54-master-69e10976b2e7 2022-02-08_11-57-master-d6f5f010b586
    mako 8.85 ms 7.83 ms: 1.13x faster
    hexiom 6.04 ms 5.54 ms: 1.09x faster
    spectral_norm 81.4 ms 75.2 ms: 1.08x faster
    pyflate 380 ms 352 ms: 1.08x faster
    scimark_sparse_mat_mult 4.05 ms 3.76 ms: 1.08x faster
    pickle_pure_python 312 us 290 us: 1.07x faster
    unpickle_pure_python 238 us 222 us: 1.07x faster
    float 63.1 ms 59.5 ms: 1.06x faster
    tornado_http 90.3 ms 86.0 ms: 1.05x faster
    html5lib 62.8 ms 60.2 ms: 1.04x faster
    regex_compile 121 ms 116 ms: 1.04x faster
    scimark_lu 106 ms 102 ms: 1.04x faster
    nqueens 70.9 ms 68.4 ms: 1.04x faster
    crypto_pyaes 70.1 ms 67.8 ms: 1.03x faster
    logging_silent 97.5 ns 94.4 ns: 1.03x faster
    sympy_integrate 17.2 ms 16.7 ms: 1.03x faster
    sympy_str 260 ms 252 ms: 1.03x faster
    sympy_expand 441 ms 427 ms: 1.03x faster
    pathlib 14.1 ms 13.7 ms: 1.03x faster
    regex_dna 164 ms 159 ms: 1.03x faster
    regex_v8 21.1 ms 20.6 ms: 1.02x faster
    sympy_sum 138 ms 136 ms: 1.02x faster
    scimark_fft 286 ms 281 ms: 1.02x faster
    pickle 9.34 us 9.19 us: 1.02x faster
    xml_etree_parse 126 ms 124 ms: 1.01x faster
    richards 43.0 ms 42.4 ms: 1.01x faster
    xml_etree_generate 71.2 ms 70.5 ms: 1.01x faster
    scimark_monte_carlo 58.8 ms 58.3 ms: 1.01x faster
    deltablue 3.60 ms 3.58 ms: 1.01x faster
    chaos 64.6 ms 64.3 ms: 1.01x faster
    2to3 216 ms 215 ms: 1.00x faster
    pidigits 155 ms 154 ms: 1.00x faster
    nbody 76.4 ms 77.0 ms: 1.01x slower
    python_startup_no_site 3.96 ms 3.99 ms: 1.01x slower
    xml_etree_iterparse 82.5 ms 83.1 ms: 1.01x slower
    scimark_sor 103 ms 104 ms: 1.01x slower
    unpickle 11.3 us 11.4 us: 1.01x slower
    telco 5.53 ms 5.58 ms: 1.01x slower
    python_startup 5.56 ms 5.62 ms: 1.01x slower
    json_loads 20.6 us 20.8 us: 1.01x slower
    json_dumps 9.61 ms 9.77 ms: 1.02x slower
    dulwich_log 60.9 ms 62.1 ms: 1.02x slower
    logging_format 5.47 us 5.62 us: 1.03x slower
    pickle_list 3.06 us 3.15 us: 1.03x slower
    django_template 30.2 ms 31.2 ms: 1.03x slower
    meteor_contest 80.7 ms 84.1 ms: 1.04x slower
    pickle_dict 21.9 us 23.4 us: 1.07x slower
    logging_simple 4.84 us 5.20 us: 1.07x slower
    Geometric mean (ref) 1.01x faster

    Benchmark hidden because not significant (9): unpack_sequence, go, raytrace, chameleon, xml_etree_process, fannkuch, sqlite_synth, regex_effbot, unpickle_list

    tiran commented 2 years ago

    I re-ran the benchmark of d6f5f010b586:

    Benchmark 2022-02-08_11-54-master-69e10976b2e7 2022-02-08_11-57-master-d6f5f010b586
    pickle_pure_python 312 us 281 us: 1.11x faster
    unpickle_pure_python 238 us 216 us: 1.10x faster
    pyflate 380 ms 349 ms: 1.09x faster
    hexiom 6.04 ms 5.55 ms: 1.09x faster
    logging_silent 97.5 ns 89.8 ns: 1.09x faster
    float 63.1 ms 59.3 ms: 1.07x faster
    html5lib 62.8 ms 59.1 ms: 1.06x faster
    crypto_pyaes 70.1 ms 66.1 ms: 1.06x faster
    json_loads 20.6 us 19.4 us: 1.06x faster
    tornado_http 90.3 ms 86.1 ms: 1.05x faster
    mako 8.85 ms 8.45 ms: 1.05x faster
    richards 43.0 ms 41.1 ms: 1.05x faster
    xml_etree_parse 126 ms 120 ms: 1.05x faster
    logging_format 5.47 us 5.25 us: 1.04x faster
    sympy_integrate 17.2 ms 16.5 ms: 1.04x faster
    sympy_str 260 ms 251 ms: 1.04x faster
    fannkuch 325 ms 314 ms: 1.04x faster
    regex_v8 21.1 ms 20.4 ms: 1.04x faster
    sympy_expand 441 ms 425 ms: 1.04x faster
    regex_compile 121 ms 117 ms: 1.03x faster
    sympy_sum 138 ms 134 ms: 1.03x faster
    scimark_lu 106 ms 103 ms: 1.03x faster
    go 128 ms 125 ms: 1.03x faster
    pathlib 14.1 ms 13.7 ms: 1.02x faster
    scimark_monte_carlo 58.8 ms 57.9 ms: 1.02x faster
    nqueens 70.9 ms 69.9 ms: 1.02x faster
    pidigits 155 ms 153 ms: 1.01x faster
    pickle 9.34 us 9.22 us: 1.01x faster
    raytrace 278 ms 275 ms: 1.01x faster
    2to3 216 ms 213 ms: 1.01x faster
    deltablue 3.60 ms 3.56 ms: 1.01x faster
    logging_simple 4.84 us 4.78 us: 1.01x faster
    xml_etree_iterparse 82.5 ms 81.7 ms: 1.01x faster
    regex_dna 164 ms 162 ms: 1.01x faster
    unpack_sequence 32.7 ns 32.4 ns: 1.01x faster
    telco 5.53 ms 5.48 ms: 1.01x faster
    python_startup 5.56 ms 5.58 ms: 1.00x slower
    xml_etree_generate 71.2 ms 71.6 ms: 1.01x slower
    unpickle_list 4.08 us 4.12 us: 1.01x slower
    chameleon 6.07 ms 6.14 ms: 1.01x slower
    chaos 64.6 ms 65.3 ms: 1.01x slower
    json_dumps 9.61 ms 9.75 ms: 1.01x slower
    xml_etree_process 49.9 ms 50.7 ms: 1.01x slower
    meteor_contest 80.7 ms 82.0 ms: 1.02x slower
    scimark_sparse_mat_mult 4.05 ms 4.12 ms: 1.02x slower
    unpickle 11.3 us 11.5 us: 1.02x slower
    django_template 30.2 ms 31.0 ms: 1.02x slower
    scimark_sor 103 ms 106 ms: 1.02x slower
    spectral_norm 81.4 ms 84.9 ms: 1.04x slower
    pickle_dict 21.9 us 23.5 us: 1.08x slower
    Geometric mean (ref) 1.02x faster

    Benchmark hidden because not significant (7): scimark_fft, dulwich_log, python_startup_no_site, regex_effbot, sqlite_synth, nbody, pickle_list

    tiran commented 2 years ago

    ICC 2021 has full support for stdatomic.h and compiles mimalloc just fine:

    $ CC="icc" ./configure -C --with-pydebug
    $ make
    $ ./python
    Python 3.11.0a5+ (main, Feb  9 2022, 15:57:40) [GCC Intel(R) C++ gcc 7.5 mode] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import sys
    >>> sys._malloc_info
    sys._malloc_info(allocator='mimalloc_debug', with_pymalloc=True, with_mimalloc=True, mimalloc_secure=4, mimalloc_debug=2)

    AIX xlc is still a problem. It does not support C11 stdatomic.h. But it comes with older GCC atomic memory access __sync function family, https://www.ibm.com/docs/en/xl-c-and-cpp-aix/13.1.3?topic=cbif-gcc-atomic-memory-access-built-in-functions-extension . It might be possible to re-implement mimalloc's atomics with __sync functions (e.g. https://gist.github.com/nhatminhle/5181506). The implementation would be less efficient, though. The __sync functions don't have memory order, atomic_loadexplicit(v) becomes \_sync_fetch_and_add(v, 0), and atomic_store_explicit() requires two full memory barriers.

    ericsnowcurrently commented 2 years ago

    FYI, PR #31164 indicates the following 3 blockers:

    After a conversation between @tiran, @daanx (the mimalloc creator), and me, we have a good idea on how to move forward. I'm looking into the PGO issue (with some help).

    ericsnowcurrently commented 2 years ago

    The PGO-on-Windows issue may be resolved now (thanks to gh-94790). Here's what I did to check:

    1. merge main into tiran:bpo-46657-mimalloc (PR gh-31164) a. run git merge main b. fix Makefile.pre.in c. run git checkout main -- Python/clinic/sysmodule.c.h
    2. run python3 Tools/clinic/clinic.py --make
    3. run python3 Tools/scripts/generate_global_objects.py
    4. change _PyStructSequence_InitType to _PyStructSequence_InitBuiltinWithFlags in Python/sysmodule.c
    5. run .\PCbuild\build.bat --pgo

    @neonene Could you verify that we should expect PGO builds to be working correctly now? Thanks!

    gvanrossum commented 2 years ago

    Do these steps point towards improvements for @tiran's mimalloc branch?

    ericsnowcurrently commented 2 years ago

    Only that the branch needs to merge in main.

    neonene commented 2 years ago

    @ericsnowcurrently Thanks for the check. I can also build the newest mimalloc branch successfully with PGO on Windows.

    As for the issue around releasing memory, my builds still show:

    >>> from test.support import Py_DEBUG
    
    >>> about32MiB = 1024*1024*32 - (32*Py_DEBUG)
    
    >>> b = [bytearray(about32MiB) for i in range(3)]
    >>> del b  # 100MB is freed to the OS
    
    >>> b = [bytearray(about32MiB - 1) for i in range(3)]
    >>> del b  # 100MB remains cached
    
    >>> sys._mi_collect(True)  # purge cached segments

    The threshold size was 1024*1024*32 - (32*Py_DEBUG) - 32 when I used bytes() instead.

    ericsnowcurrently commented 2 years ago

    @ericsnowcurrently Thanks for the check. I can also build the newest mimalloc branch successfully with PGO on Windows.

    Great! I'll cross it off my list. 😄

    As for the issue around releasing memory, my builds still show:

    Yeah, I'm going to follow up with @daanx about this.

    romange commented 2 years ago

    subscribing here to learn how the issue with mimalloc is being resolved. My project uses mimalloc as well, and I am curious to learn how to configure it to have a slimmer memory footprint.

    ericsnowcurrently commented 2 years ago

    @romange, take a look at https://github.com/python/cpython/pull/31164#issuecomment-1133583811. Other than that, there's only the existing mimalloc documentation, as well as any insight @daanx can offer.

    romange commented 2 years ago

    @ericsnowcurrently thanks! btw, I chose mimalloc because of its elegant design and its thread-local heap support. However, it seems that the project is not under active development anymore. Maybe it's because of good reasons and everything is perfect, but I can see how it can become a problem in the future (and daanx is a one-person army). What do you think about it?

    daanx commented 2 years ago

    Hi @romange, here is me: just to clarify mimalloc is actively maintained and I am dedicated to make the CPython integration work. However, I did take some needed time off this summer, :-) but am back now and hope to help resolve the outstanding issues. I already started work on the ASAN support so hope that is coming soon (although I am at ICFP'23 next week).

    romange commented 2 years ago

    Hi @daanx I am psyched to hear! :) and sorry for quickly jumping to conclusions 🤦🏼‍♂️

    daanx commented 2 years ago

    @ericsnowcurrently Thanks for the check. I can also build the newest mimalloc branch successfully with PGO on Windows.

    As for the issue around releasing memory, my builds still show:

    >>> from test.support import Py_DEBUG
    
    >>> about32MiB = 1024*1024*32 - (32*Py_DEBUG)
    
    >>> b = [bytearray(about32MiB) for i in range(3)]
    >>> del b  # 100MB is freed to the OS
    
    >>> b = [bytearray(about32MiB - 1) for i in range(3)]
    >>> del b  # 100MB remains cached
    
    >>> sys._mi_collect(True)  # purge cached segments

    The threshold size was 1024*1024*32 - (32*Py_DEBUG) - 32 when I used bytes() instead.

    Thanks for testing this, just to clarify:

    Generally, when an allocator requests memory from the OS (when allocating the bytearray for example), it should not immediately free that OS memory when memory is free'd just in case the program will start allocating again (which is usually the case). mimalloc does this as well and only releases more unused OS memory after a while.

    However, at an interpreter prompt the program is just waiting and may therefore not get an opportunity to do this. In such case the CPython interpreter should call sys._mi_collect(True) to collect outstanding OS memory just before blocking for user input.

    (In case you still see an increased usage after manually calling this as in your test above, something else may be going on which we should investigate in that case).

    neonene commented 2 years ago

    When I tried the following steps on Windows:

    1. b = [bytearray(1024*1024*32 - 1) for i in range(32)]
    2. del b
    3. sys._mi_collect(True)
    Then, TaskManager showed that the memory usages of python.exe are: 65af44c (PR) 1 2.del 3.collect
    v2.0.6 as-is (options) 1 GB 1 GB 40,932 KB holds 32MB?
    v2.0.6 decommit_delay = 0 1 GB 8,392 KB 8,112 KB
    v2.0.6 page_reset = 1 1 GB 6,480 KB 8,060 KB slower than above?
    main 1 2.del
    pymalloc 1 GB 3,760 K
    daanx commented 2 years ago

    Thanks for the quick test and explanation!

    1. I think generally, we should call mi_collect(true) in CPython when entering the interpreter prompt just in general so we always clean up as much as we can after each expression evaluation. Is this easy to do?
    2. Thanks for testing decommit_delay=0 -- this shows that it is indeed the delay heuristic in mimalloc. This is a good thing though in general and we should not use decommit_delay=0 generally as it would slow down the allocator for large computations.
    3. page_reset eagerly resets and is generally the best for lowest heap usage (but also costs a bit as the OS call is slowish)

    I am thinking that I should probably enable the page_reset for large block pages where the cost would be amortized and we would see the largest gains. This would fix your example I think. However, the 32MiB is exactly the "huge" block boundary, can you do the same test with other block sizes, like 64 16MiB arrays, and 16 64MiB arrays?

    Thanks

    gvanrossum commented 2 years ago

    I think generally, we should call mi_collect(true) in CPython when entering the interpreter prompt just in general so we always clean up as much as we can after each expression evaluation. Is this easy to do?

    Should be easy enough for the REPL, but there are any number of other situations where this cleanup would be expected -- before waiting for a socket operation, before reading from stdin (or any tty device), before sleeping (or select/poll/etc.), and so on.

    Maybe it should be done whenever we do a GC run too?

    daanx commented 2 years ago

    I think generally, we should call mi_collect(true) in CPython when entering the interpreter prompt just in general so we always clean up as much as we can after each expression evaluation. Is this easy to do?

    Should be easy enough for the REPL, but there are any number of other situations where this cleanup would be expected -- before waiting for a socket operation, before reading from stdin (or any tty device), before sleeping (or select/poll/etc.), and so on.

    Maybe it should be done whenever we do a GC run too?

    It is possible to call this at a GC run, but I think it would be too aggressive. Generally the current mimalloc strategies work fine and mi_collect is only to be used in select cases (I have only seen one reasonable case before for a user-mode thread pool scheduler). For the interpreter we can make such case as running each expression is a bit like running a program and one would like to clean up after evaluation -- but doing it generally for blocking calls is not needed; mimalloc runs big services that are full of blocking calls without problems. The benchmark here is also a bit non-typical with just some huge blocks instead of many smaller allocations where the observed effects will be less. It may well be that we can even fix this by just doing eager decommit for large blocks only without needing mi_collect at all -- I will look into this.

    (I am about to board a plane so won't respond for a while :-) )

    gvanrossum commented 2 years ago

    So maybe this whole "mimalloc doesn't release memory to the system" is a non-issue? Calling mi_collect() before entering the REPL only "works" because to demonstrate the problem people will do things like start an interactive interpreter, run a bit of code that allocates a huge block, free it again, and then in a separate shell run "ps" to observe the process size, while keeping the python process waiting at the next >>> prompt.

    What I think you're saying is if instead they would just continue to run some other code (that doesn't reuse the huge block that was just freed) the excess memory will be returned to the system after some time (how is that measured? wall clock time? number of alloc/free ops?), and the observation that the huge block is not returned to the system is a mere artifact of the way people try to demonstrate the issue.

    neonene commented 2 years ago

    However, the 32MiB is exactly the "huge" block boundary, can you do the same test with other block sizes, like 64 16MiB arrays, and 16 64MiB arrays?

    @daanx This example holds 168 MiB, and the same goes for the reversed arrays:

    [bytearray(1024*1024 * [64,16][i%2]) for i in range(20)]  # 16MiB blocks are cached?

    EDIT: Sorry, 64 16MiB arrays reserve 1GiB, 16 64MiB arrays do not.

    Also, the cases below return about 1GiB to the OS with mi_collect():

    [bytearray(1024*1024) for i in range(1024)]
    [bytearray(1024*512)  for i in range(1024*2)]
    [bytearray(1024*256)  for i in range(1024*4)]
    [bytearray(1024*128)  for i in range(1024*8)]
    [bytearray(1024*64)   for i in range(1024*16)]
    [bytearray(1024*32)   for i in range(1024*32)]
    [bytearray(1024*16)   for i in range(1024*64)]
    [bytearray(1024*8)    for i in range(1024*128)]

    What I'm trying is to simplify test_fstring.test_many_expressions. Thanks.

    daanx commented 2 years ago

    (back from ICFP)

    Guido wrote:

    So maybe this whole "mimalloc doesn't release memory to the system" is a non-issue? Calling mi_collect() before entering the REPL only "works" because to demonstrate the problem people will do things like start an interactive interpreter, run a bit of code that allocates a huge block, free it again, and then in a separate shell run "ps" to observe the process size, while keeping the python process waiting at the next >>> prompt. What I think you're saying is if instead they would just continue to run some other code (that doesn't reuse the huge block that was just freed) the excess memory will be returned to the system after some time (how is that measured? wall clock time? number of alloc/free ops?), and the observation that the huge block is not returned to the system is a mere artifact of the way people try to demonstrate the issue.

    Yes, this is exactly right! Since reserving and releasing OS memory is so expensive, we delay the release a bit as the memory may be needed soon enough. The current delay is in wall clock time but fairly short (I think 500ms by default).

    When someone runs an expression from the REPL, it will actually always be the case that we no longer require any OS memory to be retained afterwards (it is like running a separate program). So, in this particular case it is actually beneficial to call mi_collect(true) after evaluation. (but again, calling mi_collect in general is not).

    daanx commented 2 years ago

    Also, the cases below return about 1GiB to the OS with mi_collect():

    Thanks for testing! When I get back home I will set things up so I can directly test these myself as well. I tried this also directly and verified that it wasn't something specific to huge objects so that's good.

    thesamesam commented 1 year ago

    [...] There is also the long-term maintenance issue. So far, mimalloc upstream has been responsive. The mimalloc code is not so huge or complicated that we couldn't maintain it (if for some reason it gets abandoned upstream). However, I think we would prefer to maintain obmalloc rather than mimalloc, all else being equal. Abandonment by the upstream seems fairly unlikely. So, I'm not too concerned about maintenance.

    I made a similar point wrt the bundling at https://github.com/python/cpython/pull/109914#issuecomment-1784315022.

    nascheme commented 10 months ago

    I think this issue could be closed now that mimalloc went into HEAD already.

    erlend-aasland commented 10 months ago

    [...] There is also the long-term maintenance issue. So far, mimalloc upstream has been responsive. The mimalloc code is not so huge or complicated that we couldn't maintain it (if for some reason it gets abandoned upstream). However, I think we would prefer to maintain obmalloc rather than mimalloc, all else being equal. Abandonment by the upstream seems fairly unlikely. So, I'm not too concerned about maintenance.

    I made a similar point wrt the bundling at #109914 (comment).

    See follow-up issue: