larryhastings / co_annotations

Other
9 stars 5 forks source link

add AnnotationBlock #5

Closed JelleZijlstra closed 3 years ago

JelleZijlstra commented 3 years ago

Fixes #4

JelleZijlstra commented 3 years ago

Thanks for pointing that out! I took a quick look and the references to FunctionBlock in compile.c seem to fall into two groups:

larryhastings commented 3 years ago
  • Making sure that things like yield, yield from, await and return only appear in functions. You can currently use those in annotations [...]
  • LOAD_FAST would only be relevant to annotations if you use the walrus operator in your annotations, which is another thing we could plausibly just prohibit.

There's already code attempting to prohibit yield expressions and walrus operators inside annotations, see compiler_check_co_annotations_is_legal() in compile.c. I'm not sure what an await expression looks like at that point, but I assume we need to prohibit that too.

larryhastings commented 3 years ago

I'm kinda itchin' to merge this. I had some minor feedback (re "// TODO and AnnotationBlock?" above), but if I don't hear back in the next, oh, nine hours--yes, overnight--I may just stomp on the Merge button and get to hacking.

larryhastings commented 3 years ago

With that push you just made, GitHub is now warning me:

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

JelleZijlstra commented 3 years ago

Just pushed a new commit to handle the AnnotationBlock in a couple more places. I'll call it a night for now, but the next things I'll want to look at are:

JelleZijlstra commented 3 years ago

And I think that warning about the commit is harmless? I didn't do anything different here from #6 which you were able to merge.

larryhastings commented 3 years ago

Would you be terribly uncomfortable if I just merged it and hacked on it some myself?

JelleZijlstra commented 3 years ago

Go ahead. I can just send new PRs if I have something to contribute.

larryhastings commented 3 years ago

So, I have mostly convinced myself of how this needs to work. I'm still figuring it out.

My understanding of how closures work in CPython is as follows: when the compiler notices that a nested function refers to names defined in an outer function, it changes how the variable is stored. Instead of storing the variable in the defining outer function's local array (fastlocals), it'll be stored in a different array (freevars). The scope of the variable also changes, from LOCAL to CELL. So the opcodes change, from e.g. LOAD_FAST to LOAD_DEREF. And now all the references to the variable use this mechanism. All the functions that ever refer to this variable keep a reference to the freevars created by the outermost function.

AnnotationBlock means something analogous will now happen inside classes. If you have a class, and a function (method) defined inside that class, and an annotation on a parameter / return code of that function, the resulting AnnotationBlock could have a reference to a name defined locally in the enclosing ClassBlock.

If this was a function defined inside another function, this would just mean changing the scope from LOCAL to CELL. But it's a function defined inside a class. Normally functions defined inside classes can't see class locals--that's on purpose in Python. We're breaking that rule, which is why we created the AnnotationBlock in the first place.

But I'm convinced we can't simply promote the variable from NAME to CELL, because that will break class creation. When the class body is running, it uses NAME access to modify the locals, which looks up the name in a dict. Classes can safely assume that there are no CELL variables in the class body, because that isn't allowed. So I assume they live in blissful ignorance of the idea of CELL variables.

The cheapest way to fix this seems to be: let the annotation function keep a reference to the class's locals dict, then wire it up so that dict is set as f_locals in the frame when executing the annotation function. That means the NAME opcodes would simply work.

Except! So, literally, I was gearing up to add some opcodes, for which I assumed the correct names would be e.g. LOAD_CLASSDEREF. It turns out there was a LOAD_CLASSDEREF added in 3.4. I'm still puzzling over this, and I'm not sure it fixes the problem for us. You can read more about it at https://bugs.python.org/issue17853 .

larryhastings commented 3 years ago

Just so we're on the same page, here's my test case:

from __future__ import co_annotations

def foo():
    class C:
        a1 = "a1"
        a2 = "a2"
        a3 = "a3"
        def method1(self, p1:a1): pass
        def method2(self, p2:a2): pass
        def method3(self, p3:a3): pass

        print(method1.__annotations__)
        print(method2.__annotations__)
    return C

import dis
dis.dis(foo)
C = foo()
print(C.method2.__annotations__)
print(C.method3.__annotations__)

If you comment out the "from future", all three annotations dicts on the three methods are set just fine. But with the "from future" turned on, none of them work. They all fail at evaluation time (when we call the annotation function), on looking up a1 | a2 | a3.

larryhastings commented 3 years ago

(I'm writing these entries as a sort of stream-of-consciousness blog on solving the problem. Not sure it will be helpful to anybody but me.)

Solving this does mean more resource use, similarly to the way I added support for closures.

The original concept for 649, and the working title for the PEP for a long time, was "Deferred Evaluation Of Annotations Using Code Objects". The original idea--which was Mark Shannon's brilliant insight--was to store the code that generated the annotation dict as an unbound code object. PEP 649 still supports that. But at the last minute I amended the user-visible interface so it supported setting a (bound) function object too, as that's a much better API for the Python programmer. I'm certainly glad I did, because closures require runtime data--namely the freevars array--and therefore bound functions. Adding support for closures was actually pretty easy; there was already a function in compile.c that created the bound function for you, and since setting __co_annotation__ to a bound function already worked, it all worked first try.

Supporting access to class variables also requires runtime data, in this case the class dict. So again I'm going to have to early-bind these annotation functions. Also, I'll have to add another pointer to the function object: a reference to the enclosing class's namespace dict. Then the frame object will need to load this dict into f_locals before executing the function.

(If making the function object larger is a big concern, we could probably hide the reference to the class dict inside the function's __dict__. That trades "making every function object one pointer larger" with "methods with annotations that refer to class variables add a small dict". For now I'll do it the straightforward way, and we can optimize for space down the road if need be.)

I do want to preserve the the small space & cpu optimization of lazy-binding the function where possible. Either by a) setting a flag every time there's a LOADNAME operation (annotation functions won't ever STORE or DELETE_), or by b) iterating over the opcodes after assembly looking for LOAD_NAME. If no LOAD_NAME opcodes were emitted, this annotation function doesn't refer to any class variables, and we can lazy-bind this function.

larryhastings commented 3 years ago

LOAD_CLASSDEREF doesn't solve our problem here, it's doing something else. Here's what it does.

LOAD_CLASSDEREF takes a crazy double-duty oparg, computed as follows:

oparg = len(co_cellvars) + name_index

name_index is the index into the co_freevars array for the name of the variable we want to load. At runtime it computes the index then uses that to find the name as follows:

idx = oparg - PyTuple_GET_SIZE(co->co_freevars);
name = PyTuple_GET_ITEM(co->co_freevars, idx);

Then it attempts to look up "name" in f_locals. If it's finds a mapping for name, it increfs that value and returns it.

If no such mapping exists, it examines freevars[oparg] (not idx!), and if that's set it increfs that value and returns it.

I don't claim to understand what problem it's solving, or how this is a solution for that problem. Of course, I'm sure that's all fine. It's just that that opcode doesn't help solve our problem. Our problem is getting a reference to that class dict as I've expounded about above. LOAD_NAME will work fine for us once we have that dict available.

larryhastings commented 3 years ago

I have an answer! AnnotationBlock should not have ste_nested set, because we want the compiler to generate NAME access for GLOBAL_IMPLICIT, not GLOBAL. That's how accessing class variables is gonna work. I'm pretty sure!

larryhastings commented 3 years ago

As of ddd974a669791690034fa9cba4c9a335823b0c1f, annotations on methods (functions defined inside a class) can now see class-local names.