scientific-python / lazy-loader

Populate library namespace without incurring immediate import costs
BSD 3-Clause "New" or "Revised" License
132 stars 20 forks source link

Possible issues with partial lazy loading #32

Closed hoedt closed 1 year ago

hoedt commented 2 years ago

I am pretty new to writing packages, but I was recently looking for exactly this kind of functionality. However, I noticed that this does not exactly does what I had in mind (or maybe it does not quite cover my use case).

E.g. Consider a package with multiple files (e.g. err.py, core.py, bells.py and whistles.py) implementing different aspects of the code. Assuming that I would like to make everything accessible, the __init__.py file could look something like this:

from .core import *
from .err import *
from .bells import feature1 as bell_feature, feature2 as extra_feature
from .whistles import feature as whistle_feature

__version__ = "0.0.1"
__all__ = core.__all__ + err.__all__ + ["bell_feature", "whistle_feature"]

Now, if I wanted to load the bells and whistles modules lazily (and understood everything correctly), I could do:

from .core import *
from .err import *

__version__ = "0.0.1"

import lazy_loader as lazy
__getattr__, __lazy_dir__, __lazy_all__ = lazy.attach(__package__, [], {
    "bells": ["feature1", "feature2"],
    "whistles": ["feature"]
})
__all__ = __lazy_all_ + core.__all__ + err.__all__  # add non-lazily loaded names

def __dir__():
    yield "__version__"
    yield from core.__all__
    yield from err.__all__
    yield from __lazy_dir__()

Technically this should give (almost) the same result as the eager version (I didn't test it), but I do see a few issues:

  1. I don't see how aliasing (import ... as ...) could be made to work here. I believe this also makes it impossible to import features from different submodules that have the same name.
  2. Having to override __all__ and __dir__ does feel quite clumsy and I would argue that a better user experience should be possible here. This might not be the biggest of issues, but as soon as you wish to have something like __version__ in your package, an override like this seems to be necessary.
  3. There is no user control over the wildcard import (__all__), since the attach simply dumps every feature in there even if it should not be available for wildcard imports.

Apart from these issues specific to lazy loading, I also noticed that lazy and eager loading are not so equivalent after all. The main differences that I noticed are:

  1. __dir__ does not list the special dunder variables (e.g. __package__, __name__, ...)
  2. sub-modules are not accessible (i.e. pkg.bells from the example above would raise an AttributeError in the lazy setting, but return the sub-module when imported eagerly)

A possible solution

To circumvent these issues, I took the liberty to mess with the code a bit and I arrived at something that could be the beginning of a solution:

def attach( pkg_name, submodules=None, names=(), aliases=None):
    """
    Atta
    Parameters
    ----------
    pkg_name : str
        Typically use ``__name__``.
    submodules : dict, optional
        Dictionary mapping module names to a list of attributes / functions.
    names : iterable of str, optional
        Typically use ``dir()``.
    aliases : dict, optional
        Aliases for imported attributes or functions.

    Returns
    -------
    __getattr__,  __dir__
    """
    if submodules is None:
        submodules = {}
    if aliases is None:
        aliases = {}

    attr_to_alias = {old: new for new, old in aliases}
    attr_to_module = {
        attr_to_alias.get(f"{mod}.{attr}", attr): 
        mod for mod, attrs in submodules.items() for attr in attrs
    }

    def __getattr__(name: str):
        name = aliases.get(name, name)
        if name in attr_to_module.keys():
            submod_path = f"{pkg_name}.{attr_to_module[name]}"
            submod = importlib.import_module(submod_path)
            return getattr(submod, name)
        elif name in submodules.keys():
            return importlib.import_module(f"{pkg_name}.{name}")

        raise AttributeError(f"module '{__name__}' has no attribute '{name}'")

    def __dir__():
        yield from attr_to_modules.keys()
        yield from base_dir
        yield from submodules.keys()

    return __getattr__, __dir__

The main difference is that we use the default names from dir() and use an additional dict to allow duplicate names (and aliasing). I also made sure that all sub-modules are available through __getattr__. This also made it possible to merge submodules and submod_attrs because you can always pass an empty list of attributes / functions to load only the module.

Now, this could be used as follows (using the same example as above):

from .core import *
from .err import *

__version__ = "0.0.1"
__all__ = core.__all__ + err.__all__ + ["bell_feature", "whistle_feature"]

import lazy_loader as lazy
__getattr__, __dir__ lazy.attach(__name__, {
    "bells": ["feature1", "feature2"],
    "whistles": ["feature"]
}, dir(), aliases={
    "bell_feature": "bells.feature1", 
    "extra_feature": "bells.feature2", 
    "whistle_feature": "whistles.feature"
})

I hope I have been clear. Just let me know if you need more information (or counterarguments).

stefanv commented 1 year ago

@hoedt Thank you very much for your comments, and sorry for missing the issue here initially.

Can I ask you to please file a single pull request per feature you want to add, then we can decide what to do. If you first want to discuss, we can start with an issue per item.

I'll give you my gut feel on each item: