python / cpython

The Python programming language
https://www.python.org
Other
62.84k stars 30.09k forks source link

fix for problems with test_longexp #36860

Closed d0e06e0e-7db7-4806-8b80-f93bcb4150bb closed 22 years ago

d0e06e0e-7db7-4806-8b80-f93bcb4150bb commented 22 years ago
BPO 578297
Nosy @tim-one, @jackjansen
Files
  • test_longexp_fix.diff: experimental patch - solves OS/2+EMX problem with test_longexp
  • malloc_stats.txt: Python malloc()/realloc() statistics
  • pymalloc-parser.diff: Change Parser/[node.c
  • parsetok.c acceler.c] to PyMalloc
  • pymalloc-compile.diff: Change Python/compile.c to use PyMalloc
  • Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

    Show more details

    GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core'] title = 'fix for problems with test_longexp ' updated_at = user = 'https://bugs.python.org/aimacintyre' ``` bugs.python.org fields: ```python activity = actor = 'tim.peters' assignee = 'aimacintyre' closed = True closed_date = None closer = None components = ['Interpreter Core'] creation = creator = 'aimacintyre' dependencies = [] files = ['4405', '4406', '4407', '4408'] hgrepos = [] issue_num = 578297 keywords = ['patch'] message_count = 14.0 messages = ['40499', '40500', '40501', '40502', '40503', '40504', '40505', '40506', '40507', '40508', '40509', '40510', '40511', '40512'] nosy_count = 4.0 nosy_names = ['tim.peters', 'jhylton', 'jackjansen', 'aimacintyre'] pr_nums = [] priority = 'normal' resolution = 'accepted' stage = None status = 'closed' superseder = None type = None url = 'https://bugs.python.org/issue578297' versions = ['Python 2.3'] ```

    d0e06e0e-7db7-4806-8b80-f93bcb4150bb commented 22 years ago

    The OS/2 EMX port has long had problems with test_longexp, which triggers gross memory consumption on this platform as a result of platform malloc behaviour.

    More recently, this appears to have been identified in MacPython under certain circumstances, although the problem is apparently more a speed issue than a memory consumption issue.

    The core of the problem is the blizzard of small mallocs as the parser builds the parse tree and creates tokens.

    The attached patch takes advantage of PyMalloc (built in by default for 2.3) to insulate the parser from adverse behaviour in the platform malloc.

    The patch has been tested on OS/2 and FreeBSD:

    The patch in its current form is for experimental evaluation, and not intended for integration into the core.

    If there is interest in seeing this integrated, I'd like feedback on a more elegant way to implement the functional change.

    I've assigned this to Jack for review in the context of its performance on the Mac.

    d0e06e0e-7db7-4806-8b80-f93bcb4150bb commented 22 years ago

    Logged In: YES user_id=250749

    Oops. On FreeBSD, test_longexp contributes 15% of the performance gain (not 25%) observed for the regression test with the patch applied.

    Also, I would expect to make this a platform specific change if its integrated, rather than a general change (unless that it is seen as more appropriate).

    jackjansen commented 22 years ago

    Logged In: YES user_id=45365

    Unfortunately on the Mac it doesn't help anything for the test_longexp problem, nor for the similar test_import problem.

    The problem with MacPython's malloc seems to be that large reallocs cause the slowdown. And the addchild() calls will continually realloc a block of memory to a slightly larger size (I gave up when it was about 800KB, after a minute or two, and growing at tens of KB per second). As soon as the block is larger than SMALL_REQUEST_TRESHOLD pymalloc will simply call the underlying system malloc/realloc.

    tim-one commented 22 years ago

    Logged In: YES user_id=31435

    Jack, please do a cvs update and try this again. I checked in changes to PyNode_AddChild() that I expect will cure your particular woes here.

    Andrew, PyMalloc was designed for oodles of small allocations. Feel encouraged to write a patch to change the compiler to use PyObject_{Malloc, Realloc, Free} instead.
    Then it will automatically exploit PyMalloc when the latter is enabled.

    Note that the regression test suite incorporates random numbers in several tests, and in ways that can affect runtime. Small differences in aggregate test suite runtime are meaningless because of this.

    jackjansen commented 22 years ago

    Logged In: YES user_id=45365

    With Tim's mods test_import and test_longexp now work fine in MacPython. This is both with and without Andrew's patch.

    Andrew, I'm assigning back to you, there's little more I can do with this patch. And you'll have to check if you still need it, or whether Tims change to node.c is goo enough for OS/2 as well.

    d0e06e0e-7db7-4806-8b80-f93bcb4150bb commented 22 years ago

    Logged In: YES user_id=250749

    To my surprise, Tim's checkin also works for the EMX port.

    I can only conclude that EMX's realloc() has a corner case tickled by test_longexp, that isn't hit with either the aggressive overallocation change or the PyMalloc change applied.

    It is also interesting to note the performance impact of Tim's checkin, particularly on FreeBSD.

    Typical runtimes for "python -E -tt Lib/test/regrtest.py -l test_longexp" on my P5-166SMP test box (FreeBSD 4.4, gcc 2.95.3 -O3): total user sys baseline: 39.1s 32.7s 6.3s my patch: 37.1s 30.3 6.7s Tim's checkin: 8.4s 7.8s 0.6s my patch+Tim's checkin 5.5s 4.9s 0.5s

    These runs with Library modules already compiled.

    While Tim's comments about timing the regression test are noted, there are nonetheless consistent reductions in execution time of the regression test as well. Typical results on the same test box: total user sys baseline: 1386s 1097s 89s my patch: 1350s 1065s 93s Tim's checkin: 1265s 1003s 67s my patch+Tim's checkin 1230s 971s 65s

    With the EMX port, the difference in timing between Tim's checkin and my patch is small, both for test_longexp and the regression test. There are noticeable gains for both test_longexp and the whole regression test with both changes in place, although not as significant as the FreeBSD results.

    MAL's PyBench 1.0 exhibits negligible performance differences between the code states on both platforms, which is as I'd expect as it doesn't appear to test compile() or eval().

    From the above, I conclude that Tim's patch gets the most bang for the buck, and that my patch (or its intent) be rejected unless someone thinks pursuing the PyMalloc changes to the parser worthwhile.

    As an aside, I did a little research on the "XXX are those actually common?" question Tim posed in the comment associated with his change: In running Lib/compileall.py against the Lib directory, 89% of PyMem_RESIZE() calls in AddChild() are the n=1 case, and 9% are rounded up to n=4.

    tim-one commented 22 years ago

    Logged In: YES user_id=31435

    Thanks for the detailed followup, Andrew! I incorporated some of this info into XXXROUNDUP's comments.

    Without either patch, the system malloc has to do two miserable things: (1) find bigger and bigger memory areas very frequently; and, (2) interleaved with that, allocate gazillions of tiny blocks too. #2 makes it difficult for the platform malloc to find free space contiguous to the blocks allocated for #1, unless it arranges to move them to "the end" of memory, or into their own memory segments. As a result it's likely to do a copy on nearly every large-block realloc, and the code used to do a realloc on every 3rd new child.

    The XXXROUNDUP patch addressed #1 by asking to grow blocks much less frequently; PyMalloc addresses #2 by getting the tiny blocks out of the platform malloc's hair. If the platform malloc is saved from either one, it's job becomes much easier.

    It would still be nice to switch the parser to using pymalloc. There are still disasters lurking, because some platform malloc packages appear to take quadratic time when *free*ing gazillions of tiny blocks (they thrash trying to coalesce them into larger contiguous free blocks).
    pymalloc doesn't try to coalesce free blocks, so is reliably immune to this disease.

    d0e06e0e-7db7-4806-8b80-f93bcb4150bb commented 22 years ago

    Logged In: YES user_id=250749

    Ok, I've prepared patches to convert the following files to use PyMalloc for memory allocation: Parser/[acceler.c|node.c|parsetok,c] (pymalloc-parser.diff) Python/compile.c (pymalloc-compile.diff)

    I didn't bother with the other files in Parser/ as my malloc logging shows that they only ever appear to make requests > 256 bytes.

    I have attached/will attach a summary from my malloc logging experiments for information.

    d0e06e0e-7db7-4806-8b80-f93bcb4150bb commented 22 years ago

    Logged In: YES user_id=250749

    Tim,

    1. any objections to the "final" patches?

    2. do you see any reason not to backport your XXXROUNDUP change - it qualifies as a performance/behaviour bugfix IMO.

    tim-one commented 22 years ago

    Logged In: YES user_id=31435

    Reassigned to Jeremy because I'm "on vacation" this week, and Jeremy is most familiar w/ the parser code. Offhand the patches looked fine to me, provided that you no longer consider test_longexp_fix.diff to be part of the patch set.

    I backported the XXXROUNDUP changes to the 2.2 maintenance branch at the sane time I changed it in the HEAD, so nothing left to do there on that count.

    Thanks for the great work!

    d0e06e0e-7db7-4806-8b80-f93bcb4150bb commented 22 years ago

    Logged In: YES user_id=250749

    Yes, test_longexp_fix.diff is no longer part of the patch set. Should I delete it?

    I must have missed your 2.2 backport commit message. I might also look at whether it can be backported to 2.1 without significant side effects.

    Thanks for your feedback too.

    03bde425-37ce-4291-88bd-d6cecc46a30e commented 22 years ago

    Logged In: YES user_id=31392

    Looks good to me for 2.3 and 2.2. I see a 40% speedup on compilation of long source strings.

    d0e06e0e-7db7-4806-8b80-f93bcb4150bb commented 22 years ago

    Logged In: YES user_id=250749

    I've committed this to the head (2.3) branch.

    I looked at a 2.2 backport, but ran into problems with the Parser/acceler.c and Parser/parsetok.c changes - due to changes in memory APIs and changes in Include/pgenheaders.h, they don't compile under 2.2 with or without --with-pymalloc :-( I had a look at what it would take to resolve, and concluded that it involved opening Pandora's box.

    Parser/node.c and Python/compile.c would be OK. Unfortunately the Parser/parsetok.c changes are the most valuable after Parser/node.c, based on allocation sizes and frequency.

    While I believe these changes would enhance the performance and stability of the 2.2 branch, most of the gains have already been made by Tim's PyNode_AddChild() fix.

    Given the likely extent of the code disturbance, and that I'm not going to have any time to go further with this for at least a month, I don't plan to do the 2.2 backport and plan to close this patch in a couple of days.

    Thanks all!

    tim-one commented 22 years ago

    Logged In: YES user_id=31435

    Thank you, Andrew! You did plenty here. We can't recommend enabling pymalloc in 2.2 anyway (at least not without a ton of other backports first to make it work right in all cases), so don't even think twise about about skipping the backport of this.

    Marked Closed.