Open brandtbucher opened 4 days ago
For what it's worth, there seem to be a number of tests which cause this failure. Below is a list of tests, and the final UOp that fails to be added, with the message No room for (UOP) (need X, got Y)
.
Note that a few of the tests don't seem to show this kind of error, but instead underflow the trace stack.
Test | UOp Failure |
---|---|
test.test_multiprocessing_fork.test_processes | LOAD_FAST (81) |
test.test_multiprocessing_forkserver.test_processes | [track stack underflow] |
test.test_multiprocessing_spawn.test_processes | _EXIT_TRACE (300) |
test_buffer | _LOAD_CONST_IMMORTAL (207) |
test_complex | COMPARE_OP_FLOAT(185) |
test_configparser | LOAD_ATTR_METHOD_WITH_VALUES (200) |
test_datetime | STORE_FAST (107) |
test_decimal | [track stack underflow] |
test_itertools | _POP_TOP (29) |
test_long | BINARY_OP (42) |
test_math | CALL_KW_BOUND_METHOD (170) |
test_plistlib | LOAD_FAST (81) |
test_pow | LOAD_ATTR_METHOD_NO_DICT (199) |
test_slice | _TO_BOOL_BOOL (220) |
test_struct | CALL_METHOD_DESCRIPTOR_FAST (175) |
test_tarfile | _LOAD_CONST (79) |
test_venv | [trace stack underflow] |
test_zipfile | POP_JUMP_IF_TRUE (98) |
test_zipimport | [trace stack underflow] |
test_zipimport_support | CALL_KW_BOUND_METHOD (170) |
test_zoneinfo | CALL_PY_EXACT_ARGS (180) |
Here's a simpler (if slightly cursed) reproducer:
# foo.py
def foo():
# Padding to increase number of UOps in trace
_ = reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
reversed(list(reversed(list(reversed(list(reversed(list(reversed(list(
[0]
))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
))))))))))))))))))))))))))))))))))))))))
for _ in range(4096): #Minimum value
foo()
./configure --enable-experimental-jit --with-pydebug
make
PYTHON_UOPS_OPTIMIZE=0 PYTHON_LLTRACE=1 ./python foo.py
Created a proto-trace for <module> (/home/jglass/Documents/cpython/foo.py:1) at byte offset 44 -- length 590
python: Python/optimizer.c:1287: uop_optimize: Assertion `length <= UOP_MAX_TRACE_LENGTH' failed.
Aborted (core dumped)
Thanks @JeffersGlass!
I talked with @markshannon, and we think it's probably worth reworking translate_bytecode_to_trace
to be less error-prone. The problem is that we need to reserve enough space for the next bytecode's uop expansion before translating it, which is difficult to get right and difficult to check. Rather, we should be able to gracefully handle the case where we run out of space in the middle of translating an instruction, by reverting to a known safepoint (the last complete expansion) and ending the trace there.
Here's some pseudocode to capture the general idea:
buffer = [None] * MAX_TRACE_LENGTH
space_remaining = MAX_TRACE_LENGTH
last_complete = 0
index = 0
def add_uop(uop):
if space_remaining <= 0:
raise OutOfSpace
buffer[index] = uop
space_remaining -= 1
index += 1
# Save room for final _CHECK_VALIDITY_AND_SET_IP + _EXIT_TRACE:
space_remaining -= 2
add_uop(_START_EXECUTOR)
add_uop(_MAKE_WARM)
try:
for instruction in bytecode:
if has_error(instruction):
# Save room for error stub:
space_remaining -= 1
if has_exit(instruction):
# Save room for exit stub:
space_remaining -= 1
add_uop(_CHECK_VALIDITY_AND_SET_IP)
for uop in uop_expansion(instruction):
add_uop(uop)
# Successfully translated the entire instruction:
last_complete = index - 1
except OutOfSpace:
index = last_complete
add_uop(_CHECK_VALIDITY_AND_SET_IP)
add_uop(_EXIT_TRACE)
This makes sense to me!
I think the current crash is related to prepare_for_execution
, where 'spare' positions at the end of the trace are used to make implicit deopts into explicit ones, without verifying that there's actually enough spare space at the end of the buffer. For instance, foo.py
has 590 ops in its buffer before preparation, but all the possible DEOPTs mean it has 839 ops prepare_for_execution
, and the assertion fails/buffer overflows.
/// optimizer.c, L1297
printf(1, "Trace length before: %d\n", length);
length = prepare_for_execution(buffer, length);
printf(1, "Trace length after : %d\n", length);
Trace length before: 590
Trace length after : 839
So probably some bounds checking is necessary inside the loop... what would the desired behavior be if we run out of room at the end of the buffer during this step?
I think it makes sense to check during trace projection, rather than after. During trace projection, we know whether or not a given instruction is only partly-translated when we run out of space (which we need to avoid). If we run out of space, too, we can still successfully create a trace.
By the time we’re inserting exit/error stubs, in my opinion it’s too late. There’s a calculation that’s off during projection, and the fact that it doesn’t manifest until we insert stubs is a symptom, not a cause.
Crash report
When running:
On a debug build, we get an assertion failure:
On a non-debug build, we get a buffer overflow:
It looks like the optimizer just happens to do enough useful work that adding error and exit stubs doesn't put us over
UOP_MAX_TRACE_LENGTH
anymore. My hunch is that the real bug is intranslate_bytecode_to_trace
, where we're not reserving enough space for some instruction(s).CC @Fidget-Spinner and @markshannon.