python / cpython

The Python programming language
https://www.python.org
Other
63.14k stars 30.23k forks source link

function fails in exec when locals is given #90311

Closed 524b48e8-68da-4cd2-91d6-5a77cc237cbf closed 2 years ago

524b48e8-68da-4cd2-91d6-5a77cc237cbf commented 2 years ago
BPO 46153
Nosy @stevendaprano, @eryksun, @impact27

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = None closed_at = created_at = labels = ['interpreter-core', 'invalid', 'type-bug', '3.8'] title = 'function fails in exec when locals is given' updated_at = user = 'https://github.com/impact27' ``` bugs.python.org fields: ```python activity = actor = 'eryksun' assignee = 'none' closed = True closed_date = closer = 'steven.daprano' components = ['Interpreter Core'] creation = creator = 'qpeter' dependencies = [] files = [] hgrepos = [] issue_num = 46153 keywords = [] message_count = 14.0 messages = ['409038', '409039', '409040', '409042', '409043', '409044', '409045', '409046', '409055', '409063', '409066', '409089', '409090', '409093'] nosy_count = 3.0 nosy_names = ['steven.daprano', 'eryksun', 'qpeter'] pr_nums = [] priority = 'normal' resolution = 'not a bug' stage = 'resolved' status = 'closed' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue46153' versions = ['Python 3.8'] ```

524b48e8-68da-4cd2-91d6-5a77cc237cbf commented 2 years ago

When both namespace arguments are given to exec, function definitions fail to capture closure. See below:

Python 3.8.6 (default, Oct  8 2020, 14:06:32) 
[Clang 12.0.0 (clang-1200.0.32.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> exec("a = 1\ndef f(): return a\nprint(f())")
1
>>> exec("a = 1\ndef f(): return a\nprint(f())", {})
1
>>> exec("a = 1\ndef f(): return a\nprint(f())", {}, {})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 3, in <module>
  File "<string>", line 2, in f
NameError: name 'a' is not defined
>>> 
524b48e8-68da-4cd2-91d6-5a77cc237cbf commented 2 years ago

This might be related to https://bugs.python.org/issue41918

stevendaprano commented 2 years ago

The function you use in exec is not a closure. The function:

    def f():
        return a

does not capture the top-level variable "a", it does a normal name lookup for a. You can check this yourself by looking at f.__closure__ which you will see is None. Or you can use the dis module to look at the disassembled bytecode.

To be a closure, you have to insert both the "a" and the def f() inside another function, and then run that:

code = """
def outer():
    a = 1
    def f():
        return a
    return f

f = outer()
print(f())
"""
exec(code, {}, {})

prints 1 as expected.

524b48e8-68da-4cd2-91d6-5a77cc237cbf commented 2 years ago

The reason I am asking is that I am working on a debugger. The debugger stops on a frame which is inside a function. Let's say the locals is: locals() == {"a": 1} I now want to define a closure with exec. I might want to do something like: exec("def f(): return a", globals(), locals()) But this doesn't work because of the issue I describe.I would expect f() to look for a in the locals().

Even more surprising is that if I use the second argument of exec, the code in the above comment starts to fail.

stevendaprano commented 2 years ago

Here is the key phrase in the docs:

"If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition."

https://docs.python.org/3/library/functions.html#exec

And sure enough:

>>> class C:
...     a = 1
...     def f():
...             return a  # This looks for global a, not C.a
...     print(f())
... 
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 5, in C
  File "<stdin>", line 4, in f
NameError: name 'a' is not defined

which is intentional behaviour. Functions defined inside a class do not have direct access to the variables inside the class. I thought there was a FAQ about this but I can't find it now.

So there is no bug here. By passing two distinct dicts as the globals and locals to exec, the interpreter treats the code as if it were being executed inside the body of a class statement. Both the a and the f get created in the locals dict, not the globals dict:

>>> g = {'__builtins__': None}
>>> l = {}
>>> exec("""a = 1
... def f():
...     return a
... """, g, l)
>>> g
{'__builtins__': None}
>>> l
{'a': 1, 'f': <function f at 0x7fa07b83e0e0>}

But when you call f(), it is looking for a in the globals dict.

stevendaprano commented 2 years ago

I now want to define a closure with exec. I might want to do something like: exec("def f(): return a", globals(), locals())

That doesn't create a closure.

I would expect f() to look for a in the locals().

I'm sorry, but your expectation that f() will look for a in the locals dict is not correct. That's not how name resolution in Python works. a is looked up as a global. You can't turn it into a local variable just by providing locals.

The names of the parameters are unfortunately confusing. The globals parameter is always the global namespace. But locals is *never the function's local namespace. Nor is it a surrounding scope (nested functions), but it may be treated as a surrounding *class scope.

I agree that the behaviour is surprising and complex, but if you work through the documentation carefully, it is behaving as designed.

What we need to realise is that locals describes the namespace where the *def statement* runs, not the namespace used by the body of the function. The function body's locals is always created when the function is called, it is inaccessible from outside the function, and it most certainly does not use the so-called "locals" parameter given to exec().

524b48e8-68da-4cd2-91d6-5a77cc237cbf commented 2 years ago

Thank you for your explaination. Just to be sure, it is expected that:

exec("a = 1\ndef f(): return a\nprint(f())", {})

Runs successfully but

exec("a = 1\ndef f(): return a\nprint(f())", {}, {})

Doesn't?

stevendaprano commented 2 years ago

"Expected" is a strong word. It took me a lot of careful reading of the documentation and experimentation to decide that, yes, I expect the second case to fail when the first case succeeds.

Which reminds me of a common anecdote from mathematics:

https://hsm.stackexchange.com/questions/7247/in-a-popular-anecdote-who-took-20-minutes-to-decide-that-a-thing-was-obvious

eryksun commented 2 years ago

If exec gets two separate objects as globals and locals, the code will be executed as if it were embedded in a class definition.

That's a misleading comparison because a class definition intentionally supports nonlocal closures, which exec() doesn't support and shouldn't support. For example:

    a = 1

    def f():
        a = 2
        class C:
            print(a)

    def g():
        a = 2
        class C:
            nonlocal a
            a = 3
        print(a)
    >>> f()
    2
    >>> g()
    3

exec() executes as module code. Using separate globals and locals mappings doesn't magically change how the code is compiled and executed to make it equivalent to a class definition. To understand the case of separate globals and locals, just remember that assigning to a variable by default makes it a local variable, unless it's declared as a global. Also, class and function definitions are implicitly an assignment, which by default will be local.

stevendaprano commented 2 years ago

On Thu, Dec 23, 2021 at 12:15:29AM +0000, Eryk Sun wrote:

Eryk Sun \eryksun@gmail.com\ added the comment:

> If exec gets two separate objects as globals and locals, > the code will be executed as if it were embedded in a > class definition.

That's a misleading comparison

That's taken straight out of the documentation.

I don't think it is misleading, it is the opposite of misleading. Until I understood that exec with two different mapping objects as globals and locals behaves as if the code where embedded inside a class, I found the reported behaviour totally perplexing.

If you think it is wrong, how would you explain the observed behaviour, and how would you word the documentation?

because a class definition intentionally supports nonlocal closures,

I don't know what you mean by that. Classes are never closures. Only functions can be closures. (Be closures? *Have* a closure? The terminology is ambiguous.)

>>> def f():
...     a = 1
...     class C:
...             nonlocal a
...             a = 999
...     print(a)
...     return C
... 
>>> C = f()
999
>>> C.__closure__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: type object 'C' has no attribute '__closure__'. Did you mean: '__module__'?

I don't know what terminology is appropriate here, but "closure" is surely not it.

which exec() doesn't support and shouldn't support. For example: [snip examples]

Neither of those cases are relevant to the example here.

exec() executes as module code. Using separate globals and locals mappings doesn't magically change how the code is compiled and executed to make it equivalent to a class definition.

Neither I nor the documentation said it was equivalent to a class definition. It is equivalent to code executed inside a class scope.

To understand the case of separate globals and locals, just remember that assigning to a variable by default makes it a local variable, unless it's declared as a global. Also, class and function definitions are implicitly an assignment, which by default will be local.

Neither of those facts explain why the example code

"""a = 1
def f():
    return a
print(f())
"""

behaves differently when given two distinct dicts as the globals and locals parameters, versus all the other cases (no arguments provided, or one argument, or the same dict repeated twice).

Only the case where the provided globals and locals dicts are distinct behaves differently, and it behaves exactly the same as if you embedded that chunk of code inside a class definition and then executed it.

eryksun commented 2 years ago

That's taken straight out of the documentation.

Yes, but it's still a misleading comparison.

Until I understood that exec with two different mapping objects as globals and locals behaves as if the code where embedded inside a class, I found the reported behaviour totally perplexing.

The basic execution model of Python is that a frame that executes with non-optimized locals -- in module and class definitions -- can use the same mapping for globals and locals. Indeed, that's how the interpreter executes modules. However, exec() is generalized to allow executing module code with separate globals and locals.

Saying that code will be "executed as if it were embedded in a class definition" is correct only so far as the fact that globals and locals are different in this case. But it's also misleading because the code gets compiled as module-level code, not as class code.

It should be pretty obvious why the following fails:

exec("a = 1\ndef f(): return a\nprint(f())", {}, {})

Assignment is local by default, unless otherwise declared. Function f() has no access to the local scope where a is defined because Python doesn't support closures over non-optimized locals, particularly because we emphatically do not want that behavior for class definitions.

It should be equally clear why the following succeeds:

exec("global a\na = 1\ndef f(): return a\nprint(f())", {}, {})

because a class definition intentionally supports nonlocal closures,

I don't know what you mean by that. Classes are never closures. Only functions can be closures.

I didn't say that a class can be a closure. That's never the case because a class uses non-optimized locals. But a class definition does support free variables that are bound to an enclosing scope. exec() does not support this, so the exact same code can execute differently in the context of a class definition.

It is equivalent to code executed inside a class scope.

That depends on the code and the context. Please refer to my first example in comparison to the following:

    a = 1
    def f():
        a = 2
        exec('print(a)', globals(), {})
    >>> f()
    1

It's different behavior for print(a) because both exec() and compile(source, filename, 'exec') produce module code, not class code. The free variable a gets bound to the global scope for the exec() example, while for the class definition free variable a is bound to the local a in the frame of the function call.

To implement this different behavior, the code object for a class definition uses bytecode operations such as COPY_FREE_VARS and LOAD_CLASSDEREF, which are never used for module-level code. For example, from the original example, here's the class definition code:

    >>> dis.dis(f.__code__.co_consts[2])
                  0 COPY_FREE_VARS           1
                  2 LOAD_NAME                0 (__name__)
                  4 STORE_NAME               1 (__module__)
                  6 LOAD_CONST               0 ('f.<locals>.C')
                  8 STORE_NAME               2 (__qualname__)

      4          10 LOAD_NAME                3 (print)
                 12 LOAD_CLASSDEREF          0 (a)
                 14 CALL_FUNCTION            1
                 16 POP_TOP
                 18 LOAD_CONST               1 (None)
                 20 RETURN_VALUE
stevendaprano commented 2 years ago

On Thu, Dec 23, 2021 at 05:47:33AM +0000, Eryk Sun wrote:

Eryk Sun \eryksun@gmail.com\ added the comment:

> That's taken straight out of the documentation.

Yes, but it's still a misleading comparison.

I asked how you would re-word the docs, but you haven't responded.

The description given in the docs exactly explains the observed behaviour. Without recognising that, the observed behaviour is perplexing to the point that it suggested to at least one person that it was a bug in the language.

If you're not prepared to suggest an improvement to the documentation, then I don't think that this conversation is going anywhere and maybe we should just let the discussion die.

But for the record, in case you, or anyone else, does want to continue the discussion in the hope of reaching additional insight to the problem, my further comments are below.

[...]

Saying that code will be "executed as if it were embedded in a class definition" is correct only so far as the fact that globals and locals are different in this case.

So it's correct in all the ways that matter:

and incorrect in no ways at all (see below). I don't think that supports a charge of "misleading".

The bottom line here is that the description in the docs that you call "misleading" did not mislead me, but lead me directly to the solution of why the code behaved as it did, and why that was the intentional behaviour rather than a bug.

So un-misleading, if you will.

But it's also misleading because the code gets compiled as module-level code, not as class code.

Obviously there is no actual "class code" involved. That is why the description says that it is executed *as if* it were embedded inside a class statement, rather than by having an actual class statement added to your source string.

I don't understand your comment about "compiled as module-level ... not as class code". What's class code? (Aside from the actual class statement itself, which is a red herring.)

If you look at the disassembly of the following two snippets:

dis.dis("""
a = 1
def f():
    return a
print(f())
""")

and

dis.dis("""
class C:
    a = 1
    def f():
        return a
    print(f())
""")

the generated bytecode for the lines a = 1 etc is the same, putting aside the code for the actual class statement part. You get the same code for a = 1

LOAD_CONST                (1)
STORE_NAME                (a)

the same code for both the body of the function:

LOAD_GLOBAL               (a)
RETURN_VALUE

and the def f() statement:

LOAD_CONST                (<code object ...>)
LOAD_CONST                ('f')
MAKE_FUNCTION
STORE_NAME

and the same code for the call to print:

 LOAD_NAME                (print)
 LOAD_NAME                (f)
 CALL_FUNCTION
 CALL_FUNCTION
 POP_TOP
 LOAD_CONST               (None)
 RETURN_VALUE

Obviously the offsets and locations of constants will be different, but aside from those incidental details, the code generated for the block is the same whether it is inside a class statement or not.

So I don't understand what you consider to be the difference between code compiled at module-level and code compiled at class-level. They seem to me to be identical (aside from the incidentals).

The visible difference in behaviour relates to the *execution* of the code, not to whether (quote):

"the code gets compiled as module-level code [or] as class code".

There is no distinct "class code". The difference in behaviour is in the execution, not to the compilation.

It should be pretty obvious why the following fails:

exec("a = 1\\ndef f(): return a\\nprint(f())", {}, {})

Yes, it is obvious why it fails, in the same sense as the maths joke about the professor who stares at the equations on the blackboard for twenty minutes before exclaiming "Yes, it is obvious!".

It takes a sophisticated understanding of Python's scoping rules to understand why that fails when the other cases succeed.

Assignment is local by default, unless otherwise declared. Function f() has no access to the local scope where a is defined

With the same dict used for globals and locals, execution runs the statements a = 1, the def f and the print in the same scope, which is both global and local. This is what happens when you run code at the module level: locals is globals.

Consequently, the statement a = 1 assigns a to the local namespace, which is the global namespace. And the call to f() retrieves a from the global namespace, which is the local namespace.

This is what happens when you execute the code at module-level.

With different dicts, the three statements still run in the same scope, the local scope, but the call to f() attempts to retrieve a from the global namespace, which is distinct from local namespace.

This is what happens when you execute code inside a class body, just as the docs suggest.

> because a class definition intentionally supports nonlocal closures, > >I don't know what you mean by that. Classes are never closures. Only >functions can be closures.

I didn't say that a class can be a closure. That's never the case because a class uses non-optimized locals. But a class definition does support free variables that are bound to an enclosing scope.

Right -- but that's not the same as a closure.

A class with free variables bound to an enclosing scope is not a closure, nor is it a class with a closure. I don't think we have terminology for it, other than the mouthful "a class with free variables bound to an enclosing scope", or perhaps "a class with nonlocal variables".

In any case, whatever we want to call it, it has nothing to do with this bug report. Its a distraction.

To implement this different behavior, the code object for a class definition uses bytecode operations such as COPY_FREE_VARS and LOAD_CLASSDEREF, which are never used for module-level code. For example, from the original example, here's the class definition code:

None of this is relevant to the original examples in this bug report, which does not involve a class statement, let alone a class statement involving nonlocals.

You seem to be arguing that a description in the docs is "misleading", not because it misleads, but because it don't describe a situation which has nothing to do with the situation that the docs are describing.

Anyway, if anyone is still reading this far, I think that the documentation is correct, but if anyone wants to suggest an improvement which doesn't over-complicate the description by involving scenarios which are irrelevant to exec(), please do so.

524b48e8-68da-4cd2-91d6-5a77cc237cbf commented 2 years ago

Maybe a note could be added to https://docs.python.org/3/library/functions.html#exec

Something along the lines of:

Note: If exec gets two separate objects as globals and locals, the code will not be executed as if it were embedded in a function definition. For example, any function or comprehension defined at the top level will not have access to the locals scope.

PS: It would be nice for my usecase to have a way around this, maybe a flag in compile or exec that would produce "function code" instead of "module code". My workaround for this problem consist in wrapping my code in a function definition.

I think this means https://bugs.python.org/issue41918 should be closed as well?

eryksun commented 2 years ago

You seem to be arguing that a description in the docs is "misleading", not because it misleads, but because it don't describe a situation which has nothing to do with the situation that the docs are describing.

To me it's misleading to say "the code will be executed as if it were embedded in a class definition" because that is not always the case. The example with print(a) shows that. One can take it another level to compare function definitions in a class definition compared to exec(). A function defined in an exec() is not compiled to bind its free variables to the outer lexical scope in the context of the exec() call, while a function defined in a class definition does. For example:

class:

    def f():
       a = 2
       class C:
           def g(): print(a)
       return C.g
    >>> a = 1
    >>> g = f()
    >>> g()
    2

exec():

    def f():
       a = 2
       l = {}
       exec('def g(): print(a)', globals(), l)
       return l['g']
    >>> a = 1
    >>> g = f()
    >>> g()
    1

You asked what I would say in its place, but I don't have a simple answer that can take the place of the one-liner in the docs. Here's something, but I'm sure you won't be happy with it:

The code will be executed in a manner that's similar to a class definition with regard to the use of separate locals and globals scopes. However, there can be significant differences in certain contexts with regard to how the same code is compiled for an exec() call compared to a class definition. In particular, code in a class definition is compiled to bind its free variables to the lexical scopes of outer function calls in the defining context, which isn't possible with exec(). Also, the top-level code in a class definition supports nonlocal declarations, which is a syntax error with exec().