scientific-python / lazy-loader

Populate library namespace without incurring immediate import costs
BSD 3-Clause "New" or "Revised" License
132 stars 20 forks source link

Add context manager functionality #70

Open vnmabus opened 1 year ago

vnmabus commented 1 year ago

This experimental PR adds basic functionality to define lazy imports inside a context manager. It is heavily based on issue #12 by @tlambert03 but with a nicer syntax and much less magic. By using this approach, typing should work without the need of repeating code or using typing stubs.

The syntax used to define the lazy imports would be:

import lazy_loader as lazy

with lazy.lazy_imports():
    from .some_func import some_func
    from . import some_mod, nested_pkg

This is currently a draft PR because it still needs a lot of testing, but I wanted to receive feedback before proceeding further. It would also be easy to extend it to the absolute import case when support for that is made in lazy_loader.

Finally, I also added a "fake" test package using this syntax, for using it with the tests in the future.

tlambert03 commented 1 year ago

certainly looks nice! elegant implementation and syntax 👍

I think my only curiosity would be whether there are any edge cases for which builtins.__import__ = self._my_import fails. That is, do all of the various ways to import things (e.g. custom finders and specs, etc...) invariably go through builtins.__import__? I have no particular reason to suspect that's not the case, but that would be the thing I'd want to know for sure.

vnmabus commented 12 months ago

That is, do all of the various ways to import things (e.g. custom finders and specs, etc...) invariably go through builtins.__import__?

Apparently not. In the official documentation (https://docs.python.org/3/reference/import.html) it says:

When an import statement is executed, the standard builtin __import__() function is called. Other mechanisms for invoking the import system (such as importlib.import_module()) may choose to bypass __import__() and use their own solutions to implement import semantics.

Thus, this solution would work only when the import sentence is used.

Another thing to take into account is the official documentation of that builtin (https://docs.python.org/3/library/functions.html#import__), which says:

This function is invoked by the import statement. It can be replaced (by importing the builtins module and assigning to builtins.__import__) in order to change semantics of the import statement, but doing so is strongly discouraged as it is usually simpler to use import hooks (see PEP 302) to attain the same goals and does not cause issues with code which assumes the default import implementation is in use. Direct use of __import__() is also discouraged in favor of importlib.import_module().

However I do not know if/how it is possible to do this same thing using import hooks. Given that the proposed syntax illustrated in this PR is much clearer than existing approaches, I wonder if it would be possible to give this topic a broader discussion. Would it be possible to ping the SPEC 1 authors for their opinions?

tlambert03 commented 12 months ago

Would it be possible to ping the SPEC 1 authors for their opinions?

I suspect they're getting these messages. But cc @stefanv, @dschult, @jarrodmillman just in case


for what it's worth, though I wrote the attach_stub implementation, I have to say that I'm not really a fan of SPEC1 in general and I don't use it in my own projects. (I wrote attach_stub after finding that scikit-image was no longer working as expected for me in my IDE)

And while the final syntax achieved by this PR is quite nice, I fear that the whole lazy import thing is going deeper and deeper into possibly fragile magic in order to recover all of the "normal" things that we usually expect with python imports in both static and dynamic environments. I personally would encourage folks to step back and consider whether it's really necessary for their project before adopting it.

as a case in point... see discussion in https://github.com/scikit-image/scikit-image/pull/7073#issuecomment-1750523336, which continues to try to make lazy imports work in scikit-image without losing autocomplete in IDEs (something I think is much more important than a slight delay during import). but at the cost of making it difficult for dependents to strictly type their own projects

I saw a similar discussion in mne-python which recently endorsed spec1, where https://github.com/mne-tools/mne-python/pull/12059 was opened to deal with the same problem of static hinting in IDEs. I tend to agree with @cbrnr's comment here https://github.com/mne-tools/mne-python/pull/12059#issuecomment-1746232358 that the real culprit is lazy loading. All of these workarounds have a real cost... and posting ecosystem adoption kinda makes it look like "the right thing to do", where in fact I think it's probably a very niche case.

stefanv commented 12 months ago

@tlambert03 There is a real problem to address, though, and that is making interactive exploration possible without incurring large import costs. What we were seeing is that libraries started using the module level getattr, and each library was jury rigging its own solution. So, I don't think it's such a niche use case, and better we sort out a proper solution, as best we can, until Python starts providing a standard mechanism.

stefanv commented 12 months ago

(I can also mention that, as library authors, we kept getting bug reports about slow library imports. That's why external imports started moving inside of functions. But that doesn't solve the submodule visibility / import time problem.)

Of course, you can just ditch lazy importing and eagerly import everything from the start. But several libraries, in the past, have considered that cost too high, unlike the MNE authors.

Ultimately, we don't have to advocate for or against lazy loading but, given the expressed interest, provide a working mechanism and document visibly all related caveats.

tlambert03 commented 12 months ago

There is a real problem to address, though

I don't mean to suggest that it was implemented without reason! I definitely get the motivations behind it... and I know there's a cost incurred by eager imports. but I also see many smaller libraries that don't have the same massive architectures or import burdens adopt the same practices that the big/famous libraries are adopting. No offense meant to be sure, but I also think that the typing and IDE issues here are also important considerations in the total equation

tlambert03 commented 12 months ago

(in any case, probably shouldn't hijack this thread of @vnmabus's work... let's redirect attention back to his proposal)

stefanv commented 12 months ago

@tlambert03 I agree with you. I wonder if we could add some language to the SPEC to provide clearer guidance on when it should be used?

tlambert03 commented 12 months ago

👍... I'll think about a small addendum to add to the existing caveat and try to open a pr soon to get your thoughts.

cbrnr commented 12 months ago

FWIW (and sorry for adding yet another unrelated comment to this PR), I think the main issue with lazy loading is how it is being presented to the community. You have been creating a number of SPECs, which all have valid use cases (including lazy loading), and you're giving out badges to projects who have adopted a specific SPEC (those projects are also listed on your website). This makes it seem like a "proper" scientific package has to support all SPECs, otherwise it is lacking something that those listed projects have already achieved. For most SPECs, this is actually OK IMO (such as defined minimum supported versions), but not for lazy loading. Not all scientific projects will benefit from that extra complexity introduced by lazy loading. I'd even argue that it is useful for only a handful of packages, whereas the rest would be better off sticking with regular imports (maybe nested for very expensive imports). We've all seen the various caveats and reasons why it will probably never be officially integrated into Python, and we can never be sure that new corner cases will pop up. Therefore, I'd suggest to put lazy loading into a category separate from the other SPECs, so that you are not "forcing" projects that would like to be part of the Python scientific ecosystem as best as possible to adopt lazy loading.

jarrodmillman commented 11 months ago

This makes it seem like a "proper" scientific package has to support all SPECs,

That is certainly not the intention of the SPECs and we should make it clear that SPECs are intended to be light-weight recommendations that projects may adopt. For example, I don't think any of the existing SPECs should be adopted by all the projects. I drafted the initial version of SPEC 0 and I don't I think SPEC 0 should be followed by all the packages in the ecosystem (e.g., this package doesn't follow SPEC 0 and we don't intend for it to).

Could you open an issue on the SPECs to explain why you think people are misunderstanding what they are? If you could find any text that we can revise to make it less likely that people will misunderstand what SPECs are about, that would be greatly appreciated.

For example, the idea behind the lazy loading SPEC is that if a project wants to use lazy loading, then they can look at SPEC 1 for guidance about how to do it. For that SPEC, maybe someone could explain the reasons a project may or may not benefit from adopting lazy loading in this section: https://scientific-python.org/specs/spec-0001/#ecosystem-adoption

Also, please take a close look at the document explaining what SPECs are and see if you can soften any language that makes it seem like projects should adopt all the guidance: https://scientific-python.org/specs/purpose-and-process/

We certainly would like to be able to have SPECs that aren't even relevant to every package in the ecosystem.

stefanv commented 8 months ago

Returning to the PR, this syntax and implementation looks quite reasonable?

Do you know if this helps mypy figure out what's going on @vnmabus? If so, that's a strong case for including it. If not, then I suppose we should think about whether to include it, given that we have the .pyi parser.

vnmabus commented 8 months ago

It seems to work. Mypy just thinks that is a normal (non-lazy) import, as it is not able to notice that the with statement changes how imports are done (note that Mypy even typechecks with imports inside an if False block). That said, it would be useful to have a broad set of unit-tests for both functionality and type-checking behavior.

stefanv commented 7 months ago

It looks like this overrides __all__, so does it allow for mixed imports? I'm very tempted to get this in, but we need to make sure it doesn't mess with other parts of a user's code.

vnmabus commented 7 months ago

What do you mean by "mixed imports"?

stefanv commented 7 months ago

I meant using regular imports + lazy loading. Or a situation where you may want to augment __all__.

vnmabus commented 7 months ago

I think it would be the same as the current implementation in that regard, wouldn't it?

stefanv commented 7 months ago

No, usually you have access to __all__ and __dir__ and can manipulate them / use them as necessary.

__getattr__, __dir__, __all__ = lazy.attach_stub(__name__, __file__)

The context manager overrides __all__, so my assumption is that you need to import everything inside of it.

vnmabus commented 7 months ago

Well, that is based in the proposal in #12. I think it is still possible to manipulate __all__ and __dir__ afterwards, although this may confuse some syntax analysis tools.

Another option is to store the captured values as attributes and then having the option of not attaching to these variables automatically, maybe with a syntax similar to:

import lazy_loader as lazy

lazy_imports = lazy.lazy_imports(attach=False)

with lazy_imports:
    from .some_func import some_func
    from . import some_mod, nested_pkg

__getattr__, __dir__, __all__ = lazy_imports.attach()

This way, if you wanted to do something special with these variables it is still possible. What do you think?

stefanv commented 6 months ago

Sorry for the delay in responding, @vnmabus. I like the option you present of not attaching by default; that would solve the problem.

We'd like to make a release soon, so maybe we can add this as an experimental interface during that release?

vnmabus commented 6 months ago

I think this still needs plenty of discussion and testing before shipping (that is why it is a draft PR). In particular I just noticed that without a global import lock this code is probably not thread-safe, as another thread could be affected by the temporary change of builtins.__import__. I don't know if this library is considering thread safety in other places, but in case that it does, we would need to slightly change the implementation: change builtins.__import__ globally (or use an import hook if possible) and only skip imports if a thread-local boolean variable is set.

So, I would like for Python experts in the import machinery to participate in the conversation to evaluate if that plan is reasonable. Maybe it would be possible to achieve a greater audience opening a thread in the Python Discourse?