Py_XDECREF() expands its argument multiple times

davidmalcolm commented 11 years ago

BPO	17206
Nosy	@malemburg, @brettcannon, @jcea, @amauryfa, @mdickinson, @vstinner, @larryhastings, @benjaminp, @jkloth, @asvetlov, @davidmalcolm, @markshannon
PRs	python/cpython#30855
Files	17206.diff: Patch for 17206 perf.log: Benchmark comparison for 17206 patch perf-linux.log: Benchmark comparison for 17206 in Ubuntu + GCC asmdiff_17206.zip: Preprocessed, object and disassembled files of ceval module with and without patch. 17206-2.diff: Updated patch for 17206 17206-3.diff object.patch: Patch reusing _py_tmp object-2.patch project.diff

^{Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.}

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'type-bug'] title = 'Py_XDECREF() expands its argument multiple times' updated_at = user = 'https://github.com/davidmalcolm' ``` bugs.python.org fields: ```python activity = actor = 'Mark.Shannon' assignee = 'none' closed = True closed_date = closer = 'SilentGhost' components = ['Interpreter Core'] creation = creator = 'dmalcolm' dependencies = [] files = ['29439', '29440', '29523', '29524', '29617', '29660', '30625', '30690', '30750'] hgrepos = [] issue_num = 17206 keywords = ['patch'] message_count = 48.0 messages = ['182107', '182111', '182209', '184430', '184431', '184432', '184433', '184488', '184492', '184495', '184833', '184834', '184855', '185396', '185614', '185623', '185625', '185627', '185643', '185689', '185889', '186386', '189362', '189379', '190156', '190623', '190628', '191364', '191731', '191752', '191780', '191785', '191786', '191791', '191792', '191877', '191878', '191879', '191880', '191881', '191883', '191888', '191919', '192167', '192204', '192205', '192206', '192507'] nosy_count = 15.0 nosy_names = ['lemburg', 'brett.cannon', 'jcea', 'amaury.forgeotdarc', 'mark.dickinson', 'vstinner', 'larry', 'benjamin.peterson', 'jkloth', 'Arfrever', 'asvetlov', 'dmalcolm', 'Mark.Shannon', 'python-dev', 'ilblackdragon'] pr_nums = ['30855'] priority = 'normal' resolution = 'fixed' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue17206' versions = ['Python 3.4'] ```

davidmalcolm commented 11 years ago

When running my refcount static analyzer [1] on a large corpus of real-world C extension code for Python, I ran into the following construct in various places:

    Py_XDECREF(PyObject_CallObject(callable, args));

This is bogus code: Py_XDECREF expands its argument multiple times, so after this goes through the C preprocessor the callable is actually called invoked up to 4 times, leaking refs to 3 of the results - assuming that none of the calls fail, and a non pydebug build of CPython (which would expand the argument more times).

The raw preprocessor output for an optimized Python 2.7 looks like this: do { if ((PyObject_CallObject(callable, args)) == ((void *)0)) ; else do { if ( --((PyObject)(PyObject_CallObject(callable, args)))->ob_refcnt != 0) ; else ( ((((PyObject)((PyObject *)(PyObject_CallObject(callable, args))))->ob_type)->tp_dealloc)((PyObject *)((PyObject \)(PyObject_CallObject(callable, args))))); } while (0); } while (0);

Cleaned up (and removing some of the casts for clarity) it looks like this:
  do { 
    if ((PyObject_CallObject(callable, args)) == ((void *)0))
      ; 
    else 
      do {
        if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0)
          ;
        else
          (*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc)
            PyObject_CallObject(callable, args);
      } while (0);
  } while (0);
which is clearly not what the extension author was expecting.

Should we: (A) update the macro so that it expands its argument only once, or (B) warn people not to write code like the above?

Similar considerations apply to the other macros, I guess, but I've seen the above used "in the wild", I haven't yet seen it for the other macros).

Seen in the wild in: python-alsa-1.0.25-1.fc17: pyalsa/alsamixer.c:179 00174 | for (i = 0; i \< count; i++) { 00175 | t = PyTuple_New(2); 00176 | if (t) { 00177 | PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 00178 | PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 00179>| Py_XDECREF(PyObject_CallObject(reg, t)); 00180 | Py_DECREF(t); 00181 | } 00182 | }
00183 |
00184 | Py_XDECREF(reg);

  pyalsa/alsahcontrol.c:190
00185 |     for (i = 0; i < count; i++) { 
00186 |         t = PyTuple_New(2); 
00187 |         if (t) { 
00188 |             PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 
00189 |             PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 
00190>|             Py_XDECREF(PyObject_CallObject(reg, t)); 
00191 |             Py_DECREF(t); 
00192 |         } 
00193 |     }    
00194 |  
00195 |     Py_XDECREF(reg); 

  pyalsa/alsaseq.c:3277
03272 |     for (i = 0; i < count; i++) { 
03273 |         t = PyTuple_New(2); 
03274 |         if (t) { 
03275 |             PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 
03276 |             PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 
03277>|             Py_XDECREF(PyObject_CallObject(reg, t)); 
03278 |             Py_DECREF(t); 
03279 |         } 
03280 |     } 
03281 |  
03282 |     Py_XDECREF(reg);

python-4Suite-XML-1.0.2-14.fc17: Ft/Xml/src/domlette/refcounts.c:80 00075 | / refcount = this \/ 00076 | expected = 1; 00077 | } 00078 | else { 00079 | sprintf(buf, "Unexpected object type '%.200s'" ,node->ob_type->tp_name); 00080>| Py_XDECREF(PyObject_CallMethod(tester, "error", "s", buf)); 00081 | return 0; 00082 | } 00083 |
00084 | sprintf(buf, "%.200s refcounts", node->ob_type->tp_name); 00085 | return do_test(tester, buf, expected, node->ob_refcnt);

[Note to self: I actually saw this because it uncovered a traceback in cpychecker, which I fixed as: http://git.fedorahosted.org/cgit/gcc-python-plugin.git/commit/?id=99fa820487e380b74c2eda1d97facdf2ee6d2f3a ]

[1] https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html

brettcannon commented 11 years ago

On Thu, Feb 14, 2013 at 11:07 AM, Dave Malcolm \report@bugs.python.org\wrote:

New submission from Dave Malcolm:

When running my refcount static analyzer [1] on a large corpus of real-world C extension code for Python, I ran into the following construct in various places:
Py_XDECREF(PyObject_CallObject(callable, args));

Eww.

This is bogus code: Py_XDECREF expands its argument multiple times, so after this goes through the C preprocessor the callable is actually called invoked up to 4 times, leaking refs to 3 of the results - assuming that none of the calls fail, and a non pydebug build of CPython (which would expand the argument more times).

The raw preprocessor output for an optimized Python 2.7 looks like this: do { if ((PyObject_CallObject(callable, args)) == ((void *)0)) ; else do { if ( --((PyObject)(PyObject_CallObject(callable, args)))->ob_refcnt != 0) ; else ( ((((PyObject)((PyObject *)(PyObject_CallObject(callable, args))))->ob_type)->tp_dealloc)((PyObject *)((PyObject \)(PyObject_CallObject(callable, args))))); } while (0); } while (0);

Cleaned up (and removing some of the casts for clarity) it looks like this: do { if ((PyObject_CallObject(callable, args)) == ((void *)0)) ; else do { if (--(PyObject_CallObject(callable, args)->ob_refcnt) != 0) ; else (*(PyObject_CallObject(callable, args)->ob_type)->tp_dealloc) PyObject_CallObject(callable, args); } while (0); } while (0); which is clearly not what the extension author was expecting.

Should we: (A) update the macro so that it expands its argument only once, or

How expensive is that going to be? Since there is a 'do' statement using a temp variable would be easy to introduce, but is a compiler going to be smart enough to inline it all so it doesn't have to waste an allocation, etc.?

(B) warn people not to write code like the above?

Only if (A) is too costly. I would modify a checkout for ALL refcount macros and run the benchmark suite to see if there is a noticeable difference. If there isn't then I say change all the macros (can't make one safe and the rest not as that is just asking for inconsistent usage and trouble).

-Brett

Similar considerations apply to the other macros, I guess, but I've seen the above used "in the wild", I haven't yet seen it for the other macros).

Seen in the wild in: python-alsa-1.0.25-1.fc17: pyalsa/alsamixer.c:179 00174 | for (i = 0; i \< count; i++) { 00175 | t = PyTuple_New(2); 00176 | if (t) { 00177 | PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd)); 00178 | PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 00179>| Py_XDECREF(PyObject_CallObject(reg, t)); 00180 | Py_DECREF(t); 00181 | } 00182 | } 00183 | 00184 | Py_XDECREF(reg);
  pyalsa/alsahcontrol.c:190
00185 |     for (i = 0; i \< count; i++) {
00186 |             t = PyTuple_New(2);
00187 |             if (t) {
00188 |                     PyTuple_SET_ITEM(t, 0,
PyInt_FromLong(pfd[i].fd)); 00189 | PyTuple_SET_ITEM(t, 1, PyInt_FromLong(pfd[i].events)); 00190>| Py_XDECREF(PyObject_CallObject(reg, t)); 00191 | Py_DECREF(t); 00192 | } 00193 | } 00194 | 00195 | Py_XDECREF(reg);
  pyalsa/alsaseq.c:3277
03272 |     for (i = 0; i \< count; i++) {
03273 |         t = PyTuple_New(2);
03274 |         if (t) {
03275 |             PyTuple_SET_ITEM(t, 0, PyInt_FromLong(pfd[i].fd));
03276 |             PyTuple_SET_ITEM(t, 1,
PyInt_FromLong(pfd[i].events)); 03277>| Py_XDECREF(PyObject_CallObject(reg, t)); 03278 | Py_DECREF(t); 03279 | } 03280 | } 03281 | 03282 | Py_XDECREF(reg);

python-4Suite-XML-1.0.2-14.fc17: Ft/Xml/src/domlette/refcounts.c:80 00075 | / refcount = this \/ 00076 | expected = 1; 00077 | } 00078 | else { 00079 | sprintf(buf, "Unexpected object type '%.200s'" ,node->ob_type->tp_name); 00080>| Py_XDECREF(PyObject_CallMethod(tester, "error", "s", buf)); 00081 | return 0; 00082 | } 00083 | 00084 | sprintf(buf, "%.200s refcounts", node->ob_type->tp_name); 00085 | return do_test(tester, buf, expected, node->ob_refcnt);

[Note to self: I actually saw this because it uncovered a traceback in cpychecker, which I fixed as:

http://git.fedorahosted.org/cgit/gcc-python-plugin.git/commit/?id=99fa820487e380b74c2eda1d97facdf2ee6d2f3a]

[1] https://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html

---------- components: Interpreter Core messages: 182107 nosy: dmalcolm priority: normal severity: normal status: open title: Py_XDECREF() expands its argument multiple times versions: Python 2.7, Python 3.1, Python 3.2, Python 3.3, Python 3.4, Python 3.5

Python tracker \report@bugs.python.org\ \http://bugs.python.org/issue17206\

New-bugs-announce mailing list New-bugs-announce@python.org http://mail.python.org/mailman/listinfo/new-bugs-announce

benjaminp commented 11 years ago

I wish we could use inline functions...

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

I've worked on this with Dave Malcolm @PyCon2013 sprints.

This patch is work in progress to make Py_XDECREF() and Py_XINCREF() expands their arguments once instead of multiple times.

Because patch is work in progress, it contains old version for ease of benchmarking. To enable the old version - define OLD_17206.

Tested so far only with: Clang on Mac OS X 10.7 x64.

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

Additionally, in macros Py_XINCREF and Py_XDECREF we've took an opportunity to increase readability by changing expression:

if (item == NULL) ; else action(item); to more natural inverted condition: if (item != NULL) action(item);

There is a chance that there may be a reason for this form of expression, so let me know if this is an issue.

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

Benchmark run on Clang Mac OS X 10.7 attached of comparison with and without patch 17206.diff.

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

Command used for benchmarking was:

python perf.py -b 2n3 -f ../cpython/baseline-clang/python.exe ../cpython/experiment-clang/python.exe | tee perf.log

mdickinson commented 11 years ago

I'd suggest less generic names for the temporary variables, just to minimise the chance of accidental name collisions. _py_xincref_tmp and _py_xdecref_tmp, perhaps?

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

That names were my first idea - but then I saw Py_CLEAR uses _py_tmp variable I used it.

Should I replace in Py_CLEAR to _py_clear_tmp as well?

mdickinson commented 11 years ago

Ignore me; I don't know what I was thinking. _py_tmp is fine. There's no issue with name clashes.

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

Compiled baseline and patched version with GCC in Ubuntu 12.10 (running in VMWare).

Benchmarking results are attached: ./perf.py -b 2n3 -f ../cpython/baseline/python ../cpython/experiment/python | tee perf-linux.log

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

Additionally, disassembled ceval.o and compared baseline with experiment (with applied patch): no actual differences found.

Attached archive contains ceval from both baseline and experiment builds:

cevalb.cc - processed file of baseline build
ceval.cc - processed file of experiment build diff ceval.cc cevalb.cc - shows differences introduced by patch.
cevalb.o - object file of baseline build
ceval.o - object file of experiment build
cevalb.asm - disassembly of baseline object file
ceval.asm - disassembly of experiment object file diff ceval.asm cevalb.asm - doesn't show any differences in generated assembly code.

amauryfa commented 11 years ago

The patch contains a "#ifdef OLD_17206" that should be removed.
I know that these macros are already used everywhere, but a test for the new feature would be nice (in _testcapimodule.c) For example, Py_DECREF(PyLong_FromLong(0)) does not fail or leak when tests are run with -R.

vstinner commented 11 years ago

I'm not sure that it is a good idea to patch such major function (macro) in a minor version (ex: Python 2.7.x). I changed the Versions field to only select Python 3.4.

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

Updated patch - removed old code and addded tests to _testcapimoudule.c as Amaury suggested.

Amaury, why do you mention -R flag (from what I see it does hash randomization)? I would expect some flag, that would enforce memory leakage issues to fail test.

mdickinson commented 11 years ago

Amaury, why do you mention -R flag

He's talking about the -R argument to regrtest:

""" -R runs each test several times and examines sys.gettotalrefcount() to see if the test appears to be leaking references. The argument should be of the form stab:run:fname where 'stab' is the number of times the test is run to let gettotalrefcount settle down, 'run' is the number of times further it is run and 'fname' is the name of the file the reports are written to. These parameters all have defaults (5, 4 and "reflog.txt" respectively), and the minimal invocation is '-R :'. """

mdickinson commented 11 years ago

Thanks for the updated patch. Some comments:

In the tests, your functions should have return type PyObject* rather than void; you can use Py_RETURN_NONE as a convenient way to return something of the correct type. E.g.:

static PyObject *
test_decref_doesnt_leak(PyObject* self)
{
    Py_DECREF(PyLong_FromLong(0));
    Py_RETURN_NONE;
}

Unfortunately, the test above *will* leak: even though your patch fixes Py_XDECREF, Py_DECREF will still evaluate the argument multiple times. So we get, for example:

iwasawa:cpython mdickinson$ ./python.exe -m test -R:: test_capi [1/1] test_capi beginning 9 repetitions 123456789 ......... test_capi leaked [1, 1, 1, 1] references, sum=4 1 test failed: test_capi

It would be great to have tests for Py_INCREF and Py_XINCREF, too.
The patch introduces two extra blank lines before Py_CLEAR.
Not specifically about the patch, but: there's a comment in object.h that reads:

""" WARNING The Py_DECREF macro must have a side-effect-free argument since it may evaluate its argument multiple times. (The alternative would be to mace it a proper function or assign it to a global temporary variable first, both of which are slower; and in a multi-threaded environment the global variable trick is not safe.) """

I don't understand why the author of that comment ruled out the possibility of a local temporary variable. Anyone have any ideas? A constraint of some flavours of pre-ANSI C, perhaps?

markshannon commented 11 years ago

Mark, it was written 20 years ago. Who knows (or cares) why it was written that way? Let's just write it correctly this time.

Py_INCREF, Py_DECREF, Py_XDECREF and Py_XINCREF should all expand their argument exactly once.

Py_CLEAR should expand its argument exactly twice, once as an rvalue, then once as an lvalue.

So could we fix Py_DECREF and Py_CLEAR as well, please?

larryhastings commented 11 years ago

For the record: I care. Generally speaking CPython is a lovingly crafted source tree, and the choices its architects made are nearly always sensible and well-reasoned. When I see things like this, things that seem kind of dumb on first glance, I worry that we're missing something.

It could be that it did have a sensible reason fifteen years ago, like some bone-headed compiler generated awful code--and the macros *are* used everywhere, so maybe performance genuinely suffered. Or maybe the macros were originally intended to be usable as rvalues; it looks like Py_INCREF still permits that, and changing it to be a statement could be considered a breaking API change.

But we can't let ourselves become hidebound and superstitious. I see no reason why a local temporary variable shouldn't be fine here, and perhaps in all the macros Mark cites. So don't get me wrong, I'm +1.

(The documentation on Py_INCREF makes no mention of whether it's legal as an rvalue, and all the examples use it as a statement. Perhaps we should document it explicitly so for 3.4.)

malemburg commented 11 years ago

On 31.03.2013 21:29, Larry Hastings wrote:

Larry Hastings added the comment:

For the record: I care. Generally speaking CPython is a lovingly crafted source tree, and the choices its architects made are nearly always sensible and well-reasoned. When I see things like this, things that seem kind of dumb on first glance, I worry that we're missing something.

Did someone test the patch on a use case where the performance of this macro matters a lot ? E.g. deallocating large lists or dictionaries.

Todays CPUs have plenty of registers available, so it's possible that the patch actually speeds up things, since the decref'ed variable may well be an indirect reference. At the time Py_DECREF was written, registers were precious golden flasks, only to be filled with the most valuable of things ;-)

Just want to be sure we're not missing something...

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

Fixed Py_DECREF and Py_CLEAR as well.

Added tests for Py_INCREF and Py_XINCREF (if somebody has a better idea how to tests that INCREF doesn't leak - please, let me know).

Removed comment that Py_DECREF evaluate it's argument multiple times as not relevant anymore.

About considerations from performance point of view - I've made toy example (only this defines and main function) to test how gcc optimizer behaves in different cases - from what I see, if expression is like this (which is majority of cases in the code):
 PyObject* obj = Foo();
 Py_XDECREF(obj)

assembly code that will be produced (with -O3) is the same before and after patch.

amauryfa commented 11 years ago

As I wrote in bpo-17589, there are some extension modules (pytables) that which assume that Py_INCREF is an expression: return Py_INCREF(x), x; and Py_RETURN_NONE is also defined with a comma expression.

Oh, and Cython:
#define __Pyx_PyBool_FromLong(b) ((b) ? (Py_INCREF(Py_True), Py_True) : (Py_INCREF(Py_False), Py_False))

a2d4ba29-aed4-49f1-9357-07b008f7d717 commented 11 years ago

Amaury,

I didn't update Py_INCREF macro in this patch (because it doesn't expand it's argument multiple times) - so the examples you are showing will be working fine.

I've updated Py_XINCREF, but it can't be used as an expression anyway.

amauryfa commented 11 years ago

The last patch (17206-3.diff) has tests for the 4 macros, and looks good to me.

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 11 years ago

New changeset aff41a6421c2 by Benjamin Peterson in branch 'default': don't expand the operand to Py_XINCREF/XDECREF/CLEAR/DECREF multiple times (closes bpo-17206) http://hg.python.org/cpython/rev/aff41a6421c2

vstinner commented 11 years ago

"AMD64 Windows7 SP1 3.x" buildbot is crashing in test_marshal. It looks like a regression introduced by the changeset aff41a6421c2:

[ 88/375] test_marshal
Traceback (most recent call last):
  File "../lib/test/regrtest.py", line 1611, in <module>
    main_in_temp_cwd()
  File "../lib/test/regrtest.py", line 1586, in main_in_temp_cwd
    main()
  File "../lib/test/regrtest.py", line 774, in main
    raise Exception("Child error on {}: {}".format(test, result[1]))
Exception: Child error on test_marshal: Exit code 3221225725

http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/2009/steps/test/logs/stdio

amauryfa commented 11 years ago

Hum, a stack overflow?

jkloth commented 11 years ago

Indeed, after debugging, it is a stack overflow caused by the introduction of the different temporary variables in the Py_XINCREF, Py_XDECREF, and Py_DECREF macros.

I've attached an updated patch that reuses the _py_tmp variable for those temporary assignments thus keeping the required stack size down.

vstinner commented 11 years ago

I've attached an updated patch that reuses the _py_tmp variable for those temporary assignments thus keeping the required stack size down.

I don't understand how it would change the size of the C stack, could you please explain? object.patch looks like refactoring to simplify the work of the C preprocessor.

jkloth commented 11 years ago

At least in a debug build, the MSVC 64-bit compiler seems to allocate space for each unique variable declared in the function body. Therefore, by changing the temporary variables to be named identically, the amount of required space is minimized.

The refactoring of Py_DECREF is needed to prevent an error for local variable defined before use when Py_DECREF is nested within another macro already defining _py_tmp (it would expand to roughly PyObject *_py_tmp = _py_tmp).

benjaminp commented 11 years ago

We've "fixed" this debug-problem in other cases simply by skipping the test in question on debug builds.

jkloth commented 11 years ago

Except, in this case, it actually crashes the interpreter. I would hope to think that it isn't common practice to just hide crashers especially when there is a fix available.

benjaminp commented 11 years ago

Could you please wrap your new macros in the do .. while boilerplate?

jkloth commented 11 years ago

FYI, however, the new macro is designed as an internal implementation detail along the lines of the other _Py_* macros. That is, just a tiny piece of a larger function to be used at your own risk.

Either way, I've uploaded another version with the do { } while (0) wrapping.

benjaminp commented 11 years ago

Yes, but in case someone else mindlessly starts using it elsewhere, let's spare them the macro bugz.

markshannon commented 11 years ago

I think blaming the crash in test_marshall http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x/builds/2009/steps/test/logs/stdio on the new macros (change aff41a6421c2) is correct.

It may be the proximate cause, but that doesn't make it the underlying one. The interpreter should be more resilient against minor changes to the amount of C stack used per call.

Can we get a full stack trace (for the C stack not the Python one) for the failure?

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 11 years ago

New changeset c31bec42e411 by Victor Stinner in branch 'default': Issue bpo-17206: test.regrtest and test.script_helper enable faulthandler module http://hg.python.org/cpython/rev/c31bec42e411

jkloth commented 11 years ago

I'm unsure as to how to get a stack trace from Visual Studio, but I can assure you that the changes introduced by the change aff41a6421c2 *is* the cause of the failure in test_marshal.

In a nutshell, the overflow happens in a recursion on the marhsal.c:r_object() function for the test_marshal test of BugsTestCase.test_loads_recursion().

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 11 years ago

New changeset 1a9367d5aabc by Victor Stinner in branch 'default': Issue bpo-17206: Fix test_cmd_line and test_faulthandler for my previous change http://hg.python.org/cpython/rev/1a9367d5aabc

vstinner commented 11 years ago

Can we get a full stack trace (for the C stack not the Python one) for the failure?

Yes, using a debugger on Windows x64. I'm waiting for a renewal of my MSDN account to get Visual Studio (full version, not the Express version limited to 32-bit).

Until this, I enabled faulthandler in subprocesses created by test.regrtest and test.script_helper because it may help this issue, but also others future crashes.

vstinner commented 11 years ago

"I enabled faulthandler in subprocesses created by test.regrtest and test.script_helper..."

Oh, it doesn't help for this issue. On Windows, faulthandler is unable to dump the Python traceback on a stack overflow. On UNIX, it allocates a diffrent stack for its signal handler, and so the signal handler can run even on a stack overflow.

The Windows exit code 0xC00000FD (3221225725) means "Stack overflow / exhaustion".

[ 81/375] test_marshal
Traceback (most recent call last):
  File "../lib/test/regrtest.py", line 1618, in <module>
    main_in_temp_cwd()
  File "../lib/test/regrtest.py", line 1593, in main_in_temp_cwd
    main()
  File "../lib/test/regrtest.py", line 775, in main
    raise Exception("Child error on {}: {}".format(test, result[1]))
Exception: Child error on test_marshal: Exit code 3221225725

jkloth commented 11 years ago

Here are some results after a detailed investigation:

in debug, each variable declaration in a block is allocated on the stack
in debug, r_object() uses 1416 bytes of stack
in release, r_object() uses 512 bytes of stack
default stack size is 1MB
stack for python_d.exe has been already up'ed to 2100000

So it seems that the best option would be to increase the stack size used when linking (/STACK:). I would suggest increasing it to 3MB using /STACK:3145728.

jkloth commented 11 years ago

Here is some additional analysis of recursive functions in the 'pythoncore' MSVC project:

Ratio Release Debug Filename:Function Name Stack Stack ----------------------------------------------- 1.000 32 32 _sre.asm:_validate_inner 1.159 352 408 ast.asm:set_context 1.167 48 56 compile.asm:stackdepth_walk 1.188 128 152 memoryobject.asm:copy_rec 1.250 160 200 ast.asm:num_stmts 1.308 104 136 firstsets.asm:calcfirstset 1.312 128 168 posixmodule.asm:win32_xstat_impl_w 1.400 40 56 node.asm:freechildren 1.667 72 120 memoryobject.asm:tolist_rec 1.750 32 56 listnode.asm:list1node 1.750 32 56 parsermodule.asm:parser_compare_nodes 1.750 32 56 parsermodule.asm:validate_factor 1.750 32 56 parsermodule.asm:validate_not_test 1.750 32 56 rotatingtree.asm:RotatingTree_Enum 1.750 32 56 typeobject.asm:solid_base 1.786 112 200 bytearrayobject.asm:bytearray_setslice 1.900 80 152 compile.asm:compiler_comprehension_generator 2.143 56 120 pythonrun.asm:print_exception_recursive 2.167 48 104 arraymodule.asm:array_ass_slice 2.250 32 72 ast.asm:validate_slice 2.250 32 72 getargs.asm:skipitem 2.250 32 72 node.asm:sizeofchildren 2.250 32 72 parsermodule.asm:validate_test 2.250 32 72 setobject.asm:set_issuperset 2.278 144 328 listobject.asm:list_ass_slice 2.417 96 232 arraymodule.asm:array_ass_subscr 2.429 56 136 longobject.asm:PyLong_AsLongLong 2.429 56 136 parsermodule.asm:node2tuple 2.429 56 136 typeobject.asm:mro_subclasses 2.500 80 200 bytearrayobject.asm:bytearray_ass_subscript 2.600 40 104 errors.asm:PyErr_GivenExceptionMatches 2.750 32 88 import.asm:update_code_filenames 2.750 32 88 setobject.asm:set_richcompare 2.750 32 88 symtable.asm:symtable_visit_slice 2.750 32 88 typeobject.asm:PyType_Modified 2.766 512 1416 marshal.asm:r_object 3.125 64 200 longobject.asm:long_rshift 3.250 32 104 compile.asm:compiler_with 3.250 32 104 setobject.asm:set_issubset 3.250 32 104 typeobject.asm:assign_version_tag 3.750 32 120 abstract.asm:abstract_issubclass 3.750 32 120 typeobject.asm:PyType_Ready 3.833 48 184 ast.asm:ast_for_expr 4.250 32 136 compile.asm:compiler_visit_expr 4.250 32 136 mathmodule.asm:factorial_partial_product 4.500 80 360 genobject.asm:gen_throw 4.692 312 1464 symtable.asm:symtable_visit_stmt 5.250 32 168 _collectionsmodule.asm:deque_extend 5.250 32 168 _collectionsmodule.asm:deque_extendleft 5.250 32 168 ast.asm:alias_for_import_name 5.750 32 184 typeobject.asm:merge_class_dict 6.250 32 200 abstract.asm:PyObject_IsInstance 6.250 32 200 abstract.asm:PyObject_IsSubclass 6.250 32 200 ast.asm:validate_expr 6.500 48 312 Python-ast.asm:obj2ast_slice 7.182 88 632 parsermodule.asm:build_node_children 7.250 32 232 errors.asm:PyErr_NormalizeException 9.167 48 440 symtable.asm:symtable_visit_expr 10.250 32 328 _json.asm:encoder_listencode_obj 10.344 256 2648 Python-ast.asm:obj2ast_expr 15.955 176 2808 Python-ast.asm:obj2ast_stmt 31.750 32 1016 Python-ast.asm:ast2obj_expr

vstinner commented 11 years ago

"So it seems that the best option would be to increase the stack size used when linking (/STACK:). I would suggest increasing it to 3MB using /STACK:3145728."

Does Python allocate the whole stack (physical pages) at startup, or is it done on demand as on Linux?

jkloth commented 11 years ago

The /STACK:reserve[,commit] linker option uses 'reserve' as the upper limit of bytes to allow the executable to allocate. The optional 'commit' value (default of 4096) is the amount of physical memory allocated as needed.

Therefore the reserve could be as large as needed to prevent stack overflows without impacting the actually memory footprint of the executable. Maybe we should consider increasing the reserve to match that of Linux (8MB)? From the list I posted above, the huge increase in stack use in Python-ast.c could lead to other overflows for certain expressions.

Attached is the patch to the project file to increase the stack reserve to 4MB. The actual value doesn't impact the memory requirements of the debug build except in edge cases that would otherwise result in stack overflow exceptions.

jkloth commented 11 years ago

I forgot to note that with the increased stack reserve my previous patches to Py_DECREF() and related macros are no longer needed.

vstinner commented 11 years ago

Jeremy: "stack for python_d.exe has been already up'ed to 2100000"

Indeed, but not recently (changeset 97361d917d22):

\<\< bpo-2286: bump up the stack size of the 64-bit debug python_d.exe to 2100000. The default value of 200000 causes a stack overflow at 1965 iterations of r_object() in marshal.c, 35 iterations before the 2000 limit enforced by MAX_MARSHAL_STACK_DEPTH. >>

Oh it looks like it was the same issue: "Stack overflow exception caused by test_marshal on Windows x64". See issue bpo-2286.

Amaury wrote in this issue (in 2008, msg63540): "Can you try to raise the stack size on x64 builds? If 2Mb is enough for 32bit, 4Mb should be good in your case."

Jeremy: "Therefore the reserve could be as large as needed to prevent stack overflows without impacting the actually memory footprint of the executable. Maybe we should consider increasing the reserve to match that of Linux (8MB)?"

Ah ok, if the memory footprint is unchanged, yes please increase the arbitrary limit. I would suggest 8 MB.

1762cc99-3127-4a62-9baf-30c3d0f51ef7 commented 11 years ago

New changeset fcf079dece0d by Victor Stinner in branch 'default': Issue bpo-17206: On Windows, increase the stack size from 2 MB to 4.2 MB to fix http://hg.python.org/cpython/rev/fcf079dece0d

python / cpython

Py_XDECREF() expands its argument multiple times #61408