scientific-python / lazy-loader

Populate library namespace without incurring immediate import costs
BSD 3-Clause "New" or "Revised" License
118 stars 19 forks source link

Check which attributes have been loaded without triggering imports #51

Open lagru opened 1 year ago

lagru commented 1 year ago

Is there a mechanism to check which objects have been loaded and which ones are still in the "lazy" state? Looking through the code and inspecting the objects returned by lazy.attach_stub it didn't see an obvious way to do so.

Maybe this could be addressed by making attach not return simple functions but objects. E.g.

__getattr__, __dir__, _ = lazy.attach_stub(__name__, __file__)
__getattr__.loaded_names  # return names which were already loaded

I think this would be very helpful in debugging and testing that lazy loading actually works as intended.

I'd be happy to work on this if there is interest!

lagru commented 1 year ago

Inspecting sys.modules might help with some cases. Though, it is less explicit than checking an object which remembers which attributes were accessed.

stefanv commented 1 year ago

You could also just attach that dict to the function itself.

In [3]: def __get__(x):
   ...:     return x
   ...:

In [4]: __get__.__loaded = ['foo']

In [5]: __get__.__loaded
Out[5]: ['foo']

Question is, who will be looking at that. Perhap it can be done only when a debug flag is present in the environment.

lagru commented 1 year ago

Hmm, attach creates a new __getattr__() as a closure for each call, so I guess this would work. Though, it feels very hacky to me. We do have classes to combine logic and state. A simple class with __call__ and __repr__ seems like the saner and more flexible architectural choice to me. :sweat_smile:

Good point about the debug flag. I guess the bigger question is how to reliably and automatically test that accidental imports don't invalidate lazy loading. Basically I'd like something to turn red and notice if a PR triggers an import in scikit-image.

stefanv commented 1 year ago

If that's all you want to do, then you can just add a check to getattr that raises if a certain env variable is set, or if you're inside a certain context manager.

stefanv commented 5 months ago

I'm closing for now, since there's no obvious action to take.

lagru commented 5 months ago

Could we re-open since there's your suggestion

you can just add a check to getattr that raises if a certain env variable is set, or if you're inside a certain context manager.

which I'd be happy to take on some time?

stefanv commented 4 months ago

@lagru I'm getting back to this issue; could you help me understand what kind of debugging you need to do? Is it sufficient to raise on getattr? That will only catch the first instance. Would logging be better?

lagru commented 4 months ago

Basically, I'd like to be able to test the assumption that nothing is loaded for a given import. I like your environment variable idea. What do you think about an API like this?

LAZY_LOADER_RAISE_ON=".*" python -c "import skimage"
LAZY_LOADER_RAISE_ON="skimage.restoration" python -c "import skimage.restoration"

LAZY_LOADER_RAISE_ON would tell lazy_loader to raise an error if it is requested to load something whose __qualname__ matches the regex. This approach seems reasonably simple to implement. If necessary this we could even add LAZY_LOADER_ALLOW for a very flexible blacklist / whitelist approach.

Ideally, I'd like to have an approach that could be called from within Python, but that would require a clean import slate for the current Python process. Could be done with subprocess but then it's no better than the approach above using env variables. What do you think?

lagru commented 4 months ago

Being able to check in the current console has "X" been imported would be a bonus that might help with debugging.