Open CNSeniorious000 opened 3 months ago
The docs say (emphasis mine):
If only globals is provided, it must be a dictionary (and not a subclass of dictionary), which will be used for both the global and the local variables. If globals and locals are given, they are used for the global and local variables, respectively. If provided, locals can be any mapping object
I don't think this one should count as a bug. But maybe it could count as a bug in the sense that it raises a NameError (like, it doesn't seem to check the exact type)
Thank you! I found what you quoted in the docs here. Looks like raising a TypeError
in exec
and eval
when type(globals) is not dict
is reasonable.
That's a bit sad. I've been misusing this for a year in several packages. I used to think using a mixined ChainMap
as globals is the best practise to implement layered context, for usages that we need to make the input dict readonly, or lazy-loaded, until I encountered this issue. I think it would be great to loosen this constraint maybe in the future, I mean, allow any dict
subclass for globals
. In fact, it worked in most cases now.
On the other hand, may I ask how can I implement layered context for exec
without misusing globals
? My only thought is hacking builtins like this:
{..., '__builtins__': ChainMap(..., builtins.__dict__)}
We need to address this issue for whether we should allow a dict subclass to be passed in as globals
. Currently, even we have some tests (test_alt_framework_macos
, test_exec_with_unusual_global
etc.) that do this - something explicity disallowed in the documentation.
This will never fully work in the current interpreter as LOAD_NAME
assumes globals()
to be an exact dict.
Also the documentation is not completely correct - even if the user passes both globals
and locals
, using a non-exact-dict globals
will have issues.
So, we have a couple of options:
TypeError
in exec()
and eval()
.
LOAD_NAME
so it can deal with a mapping globals()
.
@markshannon any suggestion here as you are the owner of the interpreter now.
Change
LOAD_NAME
so it can deal with a mappingglobals()
.
- The performance will be impacted
- No backwards compatibility issue and we can remove the restriction (as far as I can tell)
Great analysis. I vote for option 2 because I believe there are more people who care more about versatility than performance when they choose Python over other programming languages for a task. Note that there's a significant amount of code that calls PyDict_*
functions besides the LOAD_NAME
handler code in bytecodes.c
that need to be modified to their PyMapping_*
equivalents in order to make globals()
as a mapping work.
I have a branch with the exact check and I needed to fix a lot of tests... honestly I'd prefer amending the docs. But making LOAD_NAME works for nonexact dict might be a not so good idea if you lose performance. Maybe we can have two implementations for that when you assume that it's a dictionary and otherwise a (mutable) mapping?
I only looked at LOAD_GLOBAL
so I was under the impression that LOAD_NAME
is the only bytecode that needs the change. Obviously at least STORE_GLOBAL
and DELETE_GLOBAL
use PyDict_*
. Then I don't believe this is an option, too many code changes. We have been treating globals as a dict so I think we should enforce that.
I believe there are more people who care more about versatility than performance when they choose Python over other programming languages for a task.
We had a lot of great features when all local variables are in a real dict and we moved forward to the current fast locals implementation. I don't think we should sacrifice such a common bytecode just for some super dark corner.
The reason I said it was a super dark corner is because - STORE_GLOBAL
will break a non-dict globals
and no one had that issue yet.
I'm leaning forward to enforcing globals to be an exact dict, and that's basically our assumption now.
So maybe 1 is the best. I already have something on that direction so I'll push it by the end of the day, hopefully.
After investigating a bit more tests (e.g., in test_getpath
), I'm wondering whether we could just accept subtypes of dict
that are just subtypes but not mixed in with other classes. Otherwise a lot of tests need to be rewritten.
I have an important question. Why does it assume that it's a dict (and it still works when it's not a dict but a subclass of a dict without a mixin?) Is it only because of PyDict_*
functions it uses or is it because of some other assumptions? if it's only because of PyDict_*
and that it would still work for subclasses of dict without mixin, I may suggest the following:
I really want to known which invariants are expected for it to work (at least, document it internally if it's not already the case (I don't know where to look at for that)). Why would it work for class frozendict(dict)
when frozendict
is read-only (I mean we have tests assessing that the exception is thrown...)? and there are some tests where you have some pre-defined dict values and where you override __missing__
so that you can use attributes instead, so I would say those are fine, isn't it?
As the one who posted this issue, I want to say something. Although I am not familiar with CPython, I do not want to hurt its performance. If allowing globals
to be any subclass of dict
or any mapping-like instance is possible with impact on performance, I don't want it to happen. My use cases are not common and I think (2) may be not worthy.
However, using a subclass does work in most cases. The easiest way is to just subclass a dict
and insert a print(key)
inside its __getitem__
and __setitem__
, and any load name happens inside an exec
got printed. So I believe currently __getitem__
and __setitem__
are called during name resolution.
The only problem I encountered is that when accessing variables in a class.
I've tried using __missing__
and subclass without mixins, like this:
class D(D):
def __missing__(self, k):
import builtins
return builtins.__dict__.get(k, k)
Everytime I access a missing key, it should return the key as-is:
eval('a', D()) # -> 'a'
But in a class scope, I still can't access the value:
exec('''
def f():
print(a)
f()
class A:
def f(self):
print(b)
A.f(0)
A().f()
class B:
print(c)
''', D())
Only the last print
raises:
Traceback (most recent call last):
...
File "<string>", line 15, in B
NameError: name 'c' is not defined
I don't know what happens when the interpreter enters a class context. I believe some copy happens here but the type doesn't remains, so accessing 'c' is done on a pure dict
instead of our D
.
So, my thought is: although the docs say that it can't be a subclass of dict
, it actually can. LOAD_NAME
is using __getitem__
somewhere. If this is real, then (2) wont impact on performance to a certain degree. And I do understand this may influence many tests so maybe it still not worth it. Then I personally prefer (3) over (1).
I found that wrapping the code in a function definition can make it support ChainMap
as globals. This may be a userland fix for this issue.
```py from collections import ChainMap class ChainMap(ChainMap, dict): # globals must be a real dict pass source = """ def _(): a = 1 class A: print(a) _() """ exec(source, ChainMap()) # 1 ```
Basically, there's a giant hole here - it's about LOAD_NAME
, STORE_NAME
vs LOAD_GLOBAL
, STORE_GLOBAL
. They have different assumptions about the global dict.
LOAD_NAME
uses the PyDict_
calls which will not go through the __getitem__
process.STORE_NAME
checks whether globals is an exact dict, if not, go through the __setitem__
process.LOAD_GLOBAL
always go through the __getitem__
process.STORE_GLOBAL
never go through the __setitem__
process.Your access a "global variable" inside a class will become a LOAD_NAME
, while inside a function, it's a LOAD_GLOBAL
. A global assignment in a module translates to a STORE_GLOBAL
, however, when it is passed to exec
, it's a STORE_NAME
, unless you use global a
which can force it to be a STORE_GLOBAL
.
Therefore, the whole thing is a mess now and we should probably set up a rule for this - do we allow a dict-like global? If so, how alike? PyDict_*
APIs require the underlying C data structure for dict, basically that means the new
method of the real dict
has to happen - that is not really the guarantee for derived class right? We can have a "pure subclass" of a dict that does not instantiate the object as dict
and potentially cause some trouble.
In any case, we need extra input here. Let's wait for @markshannon 's opinion, and possibly other core devs.
Basically, there's a giant hole here - it's about
LOAD_NAME
,STORE_NAME
vsLOAD_GLOBAL
,STORE_GLOBAL
. They have different assumptions about the global dict.
While I totally agree with you on the need to standardize the assumptions about the globals dict from the implementations of different instructions involving globals, the lack of such a standardization is not really an obstacle to allowing a mapping as the globals argument to exec
and eval
while maintaining full backwards compatibility.
This can be done by adding a PyDict_Check
-guarded fast path to existing PyDict_*
calls with a fallback to a slow path to equivalent PyMapping_*
/PyObject_*
calls. Note that nothing needs to be done for LOAD_GLOBAL
since there is already such a slow path after failing PyDict_CheckExact
:
https://github.com/python/cpython/blob/59be79ae60073f7b6bdf6ce921560c279937e4ab/Python/bytecodes.c#L1612
The performance impact with this approach should be minimal since PyDict_Check
is cheap.
Please feel free to poke holes at my PR #121389.
Allowing an arbitrary mapping for PyDict_*
APIs is an even more backwards incompatible change. I don't believe that would be accepted. Also even for this specific scenario, this change basically determines what's the standard behavior, without explicity discussing and documenting.
Allowing an arbitrary mapping for
PyDict_*
APIs is an even more backwards incompatible change. I don't believe that would be accepted. Also even for this specific scenario, this change basically determines what's the standard behavior, without explicity discussing and documenting.
Sorry, but can you elaborate on exactly in which ways allowing a mapping to the existing API for exec
and eval
would cause backwards incompatibility? My PR passes all existing tests with the only modifications being that some tests that expected SystemError
or TypeError
no longer do.
To quote your assessment above:
So, we have a couple of options: ...
- Change
LOAD_NAME
so it can deal with a mappingglobals()
. The performance will be impacted No backwards compatibility issue and we can remove the restriction (as far as I can tell)
I thought you are changing PyDict_*
APIs. It seems like you are changing the calls to those APIs, then that's a much smaller issue. Like I said, I don't know if it's desired to accept arbitrary mappings as globals. That would add some extra burden to maintainability. I'm not saying it's not the right way to go, it's definitely an option. I'm just saying that we need more input from core devs who are more familiar with the context.
I thought you are changing
PyDict_*
APIs. It seems like you are changing the calls to those APIs, then that's a much smaller issue.
Ah I see. Yeah my modifications are strictly about adding calls to the mapping API as a fallback so what works right now should continue to work.
Like I said, I don't know if it's desired to accept arbitrary mappings as globals. That would add some extra burden to maintainability. I'm not saying it's not the right way to go, it's definitely an option. I'm just saying that we need more input from core devs who are more familiar with the context.
I think the OP's use case of a layered context is a good one, as I also had such needs for dynamic injection of namespace before and had to resort to uglier workarounds. Python is known for its customizability, and a generalization of globals as a mapping like locals and builtins already are enables users to tailor namespace behaviors closer to the mental model of such projects.
For my PR though I've limited the scope of change to the execution path of exec
and eval
. __globals__
of a function and __dict__
of a module continue to be dict-only so the change should be relatively manageable.
@markshannon can you help with some feedbacks when you get a chance? Thanks!
I don't see a good reason why the globals must be a dictionary in some places, but not in others that appear very similar.
If we are going to be consistent, it must be to allow general mappings, as restricting globals to a dictionary in all cases would be a breaking change.
Since our stats show that the only instruction under discussion that is performance critical is LOAD_GLOBAL
and we already specialize that, there should be no noticeable performance impact.
I don't see a good reason why the globals must be a dictionary in some places, but not in others that appear very similar.
If we are going to be consistent, it must be to allow general mappings, as restricting globals to a dictionary in all cases would be a breaking change.
Agreed. I was hesitant to allow a module's __dict__
to be a mapping because by the same reasoning of consistency it would necessarily mean that a mapping must be allowed to be the __dict__
of any object, since _Py_module_getattro_impl
gets a module's attribute by calling _PyObject_GenericGetAttrWithDict
. That sounds like a change bigger than I anticipated, but I now think the consistency makes sense. Will go forward with such a modification then. Thanks for the feedback!
I've updated the PR to allow a mapping to be the __dict__
attribute of any object. Can you help review the PR, @markshannon? Thanks!
Bug report
Bug description:
I need to implement layered context running python code, so I choose
ChainMap
. But Python needs everyglobals
passed toexec
to be adict
, so I mixins it intoChainMap
.This patch runs well as far, but in a class it can't get the value from the outside.
I think this may because in the class context CPython uses some function like
PyDict_Get
instead of__getitem__
, so the values didn't get copied into the class's globals context.This bug caused this code fails:
Other reproduction approaches
This is reproduceable in [`pyodide`](https://github.com/pyodide/pyodide) (a wasm port of CPython 3.12.1), so you can run the code above by just clicking one of the following link: - [Stackblitz](https://stackblitz.com/edit/chainmap-class-locals-issue?file=main.py) - you can edit code and the result will sync instantly - [Python Online](https://py3.online/console#ZnJvbSBjb2xsZWN0aW9ucyBpbXBvcnQgQ2hhaW5NYXAKCgpjbGFzcyBDaGFpbk1hcChDaGFpbk1hcCwgZGljdCk6ICAjIGdsb2JhbHMgbXVzdCBiZSBhIHJlYWwgZGljdAogICAgcGFzcwoKCnNvdXJjZSA9ICIiIgoKY2xhc3MgQToKICAgIHByaW50KGEpCgoiIiIKCgpleGVjKHNvdXJjZSwgeyJhIjogMX0pCgoKZXhlYyhzb3VyY2UsIENoYWluTWFwKHsiYSI6IDJ9KSkgICMgdGhpcyBsaW5lIHJhaXNlcyBlcnJvcgo=) - this is faster but just has a console instead of an IDE
CPython versions tested on:
3.12
Operating systems tested on:
Linux, Windows
Linked PRs