Evaluation of forward references using if TYPE_CHECKING?

python-injector / injector

Python dependency injection framework, inspired by Guice

BSD 3-Clause "New" or "Revised" License

1.29k stars 81 forks source link

Evaluation of forward references using if TYPE_CHECKING? #145

Open davebirch opened 4 years ago

davebirch commented 4 years ago

Hey there, giving injector a go with a project and seeing a bit of an issue when trying to inject types which are defined as a forward reference using the if TYPE_CHECKING block, supported by MyPy help prevent circular dependency issues based on the need to import stuff just for typing.

I suspect it might not be possible and injector will require those modules to be imported at runtime, but usage of this functionality would be incredibly handy if it is possible as otherwise I find that MyPy without using TYPE_CHECKING imports tends to make more complicated OO code a mess of circular references caused by the need for importing types for type checking (or in this case, injector if I can't use the if TYPE_CHECKING work around)

I was wondering if some kind of approach of Injector importing all of the modules defined in any binding within a TYPE_CHECKING blocks so it has a reference to them that it can later use when trying to evaluate forward references?

Suspect this might cause some issues with circular references in injector itself, so perhaps it's not possible, or another approach could be taken?

Current error when I try this is:


def _infer_injected_bindings(callable: Callable, only_explicit_bindings: bool) -> Dict[str, type]:
        spec = inspect.getfullargspec(callable)
        try:
            bindings = get_type_hints(callable, include_extras=True)
        except NameError as e:
>           raise _BindingNotYetAvailable(e)
E           injector._BindingNotYetAvailable: name 'InjectMe' is not defined```

jstasiak commented 4 years ago

Hi, I don't fully grasp what scenario exactly are we talking about here – can you post a minimal piece of code that illustrates the issue and reproduces the problem with Injector?

PS. Please provide full stack trace and versions of Python and Injector you're using, the error itself is not really sufficient.

davebirch commented 4 years ago

No worries,

Python: 3.6 Injector: 0.18.3

Here's a link to the documentation of the feature in the MyPy docs, as well as a small reproducer.

https://mypy.readthedocs.io/en/stable/common_issues.html#import-cycles

Full output of the failing test below is in this gist: https://gist.github.com/davebirch/219663927271c4514738f960a8dae531

Reproducer: Simple class to inject (with incredibly contrived example that gives another example of an import that can use this feature to avoid a potential circular reference):

from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from tests.inject.reproducer import InjectTo

class InjectMe:

    def do_something(self, to_me: "InjectTo") -> None:
        to_me.value = "bar"

Simple Binding:

def bindings(binder: Binder) -> None:
    binder.bind(InjectMe)

Class we're injecting into, wanting to use if TYPE_CHECKING on the constructor

from typing import TYPE_CHECKING
from injector import inject

if TYPE_CHECKING:
    from tests.inject.to_inject import InjectMe

class InjectTo:
    @inject
    def __init__(self, inject_me: "InjectMe") -> None:
        self.inject_me = inject_me
        self.value = "foo"

One passing pytest manually instantiating the objects, one failing test doing the same with Injector:

def test_without_injection() -> None:
    inject_me = InjectMe()
    inject_to = InjectTo(inject_me)

    inject_me.do_something(inject_to)

    assert inject_to.value == "bar"

def test_injection_of_type_checking_forward_ref() -> None:
    injector = Injector(bindings)

    inject_to = injector.get(InjectTo)

    inject_to.inject_me.do_something(inject_to)

    assert inject_to.value == "bar"

Obviously your module names may vary, but I think this is a reasonable reproducer, I'm aware there's several ways of "working around" the issue in this particular example, but in my codebase, there are a couple of places where needing everything on our initialisers to be imported a runtime will require some fairly large changes to support.

I fully understand that injector requires these modules imported at runtime, but I was wondering if there might be some way for injector to identify and then individually import those modules imported in the TYPE_CHECKING blocks of any module that is included in a binding. at runtime either in bulk as part of the init, or selectively whilst analyzing forward references during intiialisation to allow it to import these modules safely out of the way of the usual import cycles?

I am also aware that this is likely non-trivial, against best-practices in and likely to cause side-effects, but thought I'd ask on the off chance it might be possible!

jstasiak commented 4 years ago

Ah, I see now, thanks for the explanation. For Injector to import the missing things it'd have to get in the business of parsing the code (or otherwise getting the AST) because that's the only way to get access to those conditional imports and then it'd have to actually execute the imports and mutate the module that has them. Unless there's some other way to do this (that I'm not seeing right now) it won't happen I'm afraid. Injector needs to be able to resolve the forward references at injection time at the latest and for that it needs the types to be present in the relevant scope.

Now, with that out of the way, there are some workarounds I see:

The most obvious is to make the from tests.inject.to_inject import InjectMe import in reproducer.py unconditional – since the reverse import in to_inject.py is protected by the TYPE_CHECKING flag there's no import cycle.
If the above is not applicable because you still have a cycle that's not demonstrated here, you can move the unconditional import to the bottom of reproducer.py and it'll work (my vague interpretation of this is that you can import from partially initialized modules if what you're importing is initialized before you (re)enter an import cycle; I made all imports conditional to force import cycle for illustration purposes):

% tree
.
└── package
    ├── __init__.py
    ├── reproducer.py
    ├── tests.py
    └── to_inject.py

1 directory, 4 files

% cat package/reproducer.py 
from typing import TYPE_CHECKING
from injector import inject

class InjectTo:
    @inject
    def __init__(self, inject_me: "InjectMe") -> None:
        self.inject_me = inject_me
        self.value = "foo"

from package.to_inject import InjectMe

% cat package/to_inject.py 
from typing import TYPE_CHECKING

from package.reproducer import InjectTo

class InjectMe:

    def do_something(self, to_me: "InjectTo") -> None:
        to_me.value = "bar"

% cat package/tests.py    
from injector import Injector

from package.reproducer import InjectTo

def fun():
    injector = Injector()

    inject_to = injector.get(InjectTo)

    inject_to.inject_me.do_something(inject_to)

    assert inject_to.value == "bar"
    print('ok')

% python -c 'from package.tests import fun; fun()'
ok

You can somewhat manually import at a later point in time:

% cat package/reproducer.py
from typing import TYPE_CHECKING
from injector import inject

if TYPE_CHECKING:
    from package.to_inject import InjectMe

def do_imports():
    global InjectMe
    from package.to_inject import InjectMe

class InjectTo:
    @inject
    def __init__(self, inject_me: "InjectMe") -> None:
        self.inject_me = inject_me
        self.value = "foo"

% cat package/to_inject.py 
from typing import TYPE_CHECKING

if TYPE_CHECKING:
    from package.reproducer import InjectTo

class InjectMe:

    def do_something(self, to_me: "InjectTo") -> None:
        to_me.value = "bar"

% cat package/tests.py    
from injector import Injector

from package.reproducer import do_imports, InjectTo

do_imports()

def fun():
    injector = Injector()

    inject_to = injector.get(InjectTo)

    inject_to.inject_me.do_something(inject_to)

    assert inject_to.value == "bar"
    print('ok')

% python -c 'from package.tests import fun; fun()'
ok

davebirch commented 4 years ago

Thanks for the detailed suggestions, yeah 1) isn't going to be possible, I'd simplified the example for the reproducer.

2) Looks like it might help, I will digest properly and see if that helps me out of the issue I'm in to workaround for now.

Some thoughts on the issue as raised:

I think you might be able to import those modules without parsing or AST being involved by simply setting TYPE_CHECKING to True before importing. Looking at the test output from my gist, I see you have that variable listed in your globalns and localns in get_type_hints.

I'm assuming all you actually need here is a reference to the type you are injecting so that you can tie it back to a binding that provides the object.

Looking at _ForwardRef, is there some way the if the eval fails based on what's already imported at runtime, this code could either: a) import the module we're trying to inject with TYPE_CHECKING = True (seems like it would be likely to fail, but might provide some conditionally better behaviour if the import could run cleanly and silently ignore if not) - seems hacky and non-optimal.

b) find some way to re-run the eval of the forward reference string in the scope of somewhere that has the injector's bound types imported already to try and tie it back? Not looked too much further in the injector code than the bits that are directly going wrong, might there be any mileage to this?

jstasiak commented 4 years ago

I can't see it working reasonably well and I don't believe it's worth the effort to do all this work just to handle code with import cycles, sorry.

I think you might be able to import those modules without parsing or AST being involved by simply setting TYPE_CHECKING to True before importing.

There are many details that are left out of this picture:

Injector is not importing anything right now
What would it import? What module exactly would it start with? It's not necessarily the class' module – what if the class is created by a function in module a and stored as a reference in module b?
It'd have to reimport a module that's already imported (the module where the class with forward references in the constructor parameters is defined)
- Since TYPE_CHECKING is now set, the newly imported module would presumably also import other modules. Those would trigger others etc. – basically you have a whole separate copy of a large part of the module/import tree.
- Now that you have the new, freshly imported modules they contain a completely new objects/classes/etc., so InjectMe from the original import has different identity from InjectMe from the new import, isinstance checks will fail, __eq__ will fail etc.
- Since the "TYPE_CHECKING-enabled import" would provide a separate new copy of InjectMe with a different identity to provide instead of the "InjectMe" reference, so any bindings of the "original" InjectMe would silently stop working in this case

I don't believe it's feasible, not without a working proof of concept presented. :)

davebirch commented 4 years ago

No worries, yeah I figured the TYPE_CHECKING approach was probably a non-starter. I'll do some more digging around the second idea and see if I can string together some kind of POC for you to look at as I've probably not explained it well!

jstasiak commented 4 years ago

That'd be cool, I'm curious now. :)

davebirch commented 4 years ago

So, just trying to figure out how to piece the POC together, and wanted to confirm a few assumptions if you don't mind?

1) get_type_hints has the two globalns and localns dicts, that as far as I can tell it uses in the case of a forward reference to directly lookup a type from these dicts. These dicts, if not passed (as is the case currently) are automatically filled by the typing module using sensible defaults like __annotations__ or __dict__ entries.

2) The binder stores an _bindings attribute which is keyed by type, a dict could be created from the keys of this of type.__name__ to type which would basically define a binding type namespace for the binder.

3) If that dict of bindings could be in scope for the call to _infer_injected_bindings, the except block could try to rerun the call to get_type_hints, using the bindings as a namespace for the lookup and follow similar handling, this would cover my use case, hopefully without causing any issues to normal operation. (if this isn't possible, it looks like these are already deferred, I shall look into where that gets picked up and look into adding the code there)

4) I think this would remove the runtime dependency that injector currently enforces on types specified in an injectable initializer so long as they are defined in a binding used by the injector that is trying to create the instance of the injectable class.

I'm just getting more familiar with how the bits of injector hang together to see if 3 is a possibility, but it otherwise looks at least theoretically possible that get_type_hints could be used in this way to be able to lookup the types in the injectable initializer from these keys.

You might be able to tell me off the bat if that makes sense or if I'm making a logical error somewhere?

jstasiak commented 4 years ago

Sure, no problem. So:

That's almost correct to my understanding. If the dictionaries are not provided there's a heuristic to try to guess the right global and local namespaces based on the object that's the "owner" of the forward references. Module objects' __dict__ properties are used there. __annotations__ dictionaries are not used as globalns nor localns, forward references are fetched from there in string form..
The first part is accurate, but there can be no name -> type mapping, as types can have non-unique names. Even their fully-qualified names can be non-unique. They don't even need to have a meaningful fully-qualified names – how do you define a fully qualified name when a class is returned from a function? Consider this:

% cat test.py
def give_me_class() -> type:
    class Class:
        pass
    return Class

MainService = give_me_class()
SecondaryService = give_me_class()
print(MainService.__name__)
print(SecondaryService.__name__)
% python test.py
Class
Class

So this is the showstopper – not really a logical error but an unappreciated detail that makes points 3 and 4 not matter in the end (but yes, if point 2 was correct then you could do what you describe in 3 and 4 quite easily).

davebirch commented 4 years ago

So I did figure that aliased classes would be a problem, but then had discounted having to worry about it because I assumed it would be caught by the first check and likely not be a forward reference that isn't in scope at runtime. I then noticed that _infer_injected_bindings is getting the type hints for the whole callable at once thus all the bindings would need to be deferred..

Bearing this in mind, in _infer_injected_bindings it looks like I could iterate spec.annotation attribute and then hand each argument's type to get_type_hints individually. This would have the advantage of allowing individual types to be deferred if get_type_hints raises, rather than the whole signature at once.

I believe this would allow for the vast majority of existing, working use cases to be entirely unaffected and work just as they do now.

As far as I can see this should ensure that the only cases going through this deferred approach either just flat out don't exist, or may be inferred from the bindings namespace?

If they can't be easily inferred then an exception gets thrown as normal and nothing changes.

I get the approach of mapping via name -> type isn't perfect and won't catch all cases, but it does feel like it should work in the vast majority of cases where someone would be using a forward reference in this way.

To cover the specific issue you mentioned, removing any non-unique names defined in the bindings and just allowing those cases to fail as normal would prevent any issue, but I don't see any reason why if we can match a unique name from the binding types to a forward reference string, it wouldn't be the type that we need?

No doubt there's more stuff I've not considered (and apologies if I really am barking up the wrong tree and wasting your time here), but it does really feel like this should be possible and really want to solve it!

jstasiak commented 4 years ago

(...) Bearing this in mind, in _infer_injected_bindings it looks like I could iterate spec.annotation attribute of this and then hand each argument's type to get_type_hints individually. This would have the advantage of allowing individual types to be deferred if they raise, rather than the whole signature at once.

In order to do that you'd have to reimplement get_type_hints basically which is slightly involved, strip PEP 593 Annotated from the returned value etc. That's doable, sure, but not trivial to implement from scratch to match what get_type_hints does.

I believe this would allow for the vast majority of working use cases to be entirely unaffected and work just as they do now.

As far as I can see this should ensure that the only cases going through this deferred approach either just flat out don't exist, or may be inferred from the bindings namespace?

No, unfortunately. There can be multiple hierarchical bindings namespaces since there can be an Injector hierarchy, but that's not really an issue. The issue is the bindings namespaces will in most cases contain only subset of type that can/are/will be injected anywhere – in extreme cases the namespaces will be totally empty, when you don't use explicit bindings (auto_bind set to True, no binder.bind calls, no @provider-decorated Module methods etc.).

If they can't be easily inferred then an exception gets thrown as normal and nothing changes.

I get the approach of mapping via name -> type isn't perfect and won't catch all cases, but it does feel like it should work in the vast majority of cases where someone would be using a forward reference in this way.

To cover the specific issue you mentioned, removing any non-unique names defined in the bindings and just allowing those cases to fail as normal would prevent any issue, but I don't see any reason why if we can match a unique name from the binding types to a forward reference string, it wouldn't be the type that we need?

Because see above – the bindings namespace can be totally empty (at the start of injecting into things at least) even in a large and complex application with many types. And if you have two classes named Service in your application but only one of them is present in the bindings namespace and if you have a code depending on the other one, with a forward reference, it'd resolve to the first one which is a bad thing.

davebirch commented 4 years ago

Ok, think I have a new plan that doesn't involve the need for reimplementing get_type_hints and gives the same result, just need to do some checking between python versions to ensure that the behaviour I'm seeing remains consistent :)

Gonna ponder on what you've said and see if I can find any solutions, I'm pretty convinced this is still possible, I shall ensure I have unit tests to confirm that the cases you've mentioned still work if I do think I can make this work.

jstasiak commented 4 years ago

It's really tricky, I'm looking forward to seeing what you come up with. :)

davebirch commented 4 years ago

Think I'm making some progress, bit delayed with anything to be able to show for it as I need to run through some internal box-ticking in order to get permission to commit anything back.

I have a POC based on my original approach which, whilst not 100% appears not to cause any failures in any existing unit tests. Am just extending the tests to cover some of the other examples you called out to identify the behaviour in these cases.

In the meantime, I seem to be hitting a bug in the pycharm debugger that means I can't stop at breakpoints anywhere within the injector codebase, so having to debug stuff in my other app using an imported injector, which is less than ideal and is slowing me down a little!

One actual question, what versions of python does injector officially support? I don't see it documented anywhere and it makes it hard working out what I can and can't use!

jstasiak commented 4 years ago

Python 3.5+ right now, but I'm considering dropping 3.5 support since its end-of-life is near.