xarray-contrib / pint-xarray

Interface for using pint with xarray, providing convenience accessors
https://pint-xarray.readthedocs.io/en/latest/
Apache License 2.0
101 stars 12 forks source link

Proposing `xarray` upstream `entry-points` #271

Open LecrisUT opened 1 month ago

LecrisUT commented 1 month ago

Having to use import pint_xarray is a bit clunky especially since it does not have explicit usage and can be deleted by some linters. How about proposing xarray upstream to expose a new entry-points to which pint-xarray can hook. They already have xarray.backends, but this feels like it doesn't fit there.

I am opening an issue here because I am not sure about the naming convention to propose, or how to give an example of how the hook should look like, e.g. at which stage should these entry-points be imported at.

keewis commented 1 month ago

this idea has come up before, see pydata/xarray#7348. You could imagine loading the entrypoint library whenever this particular attribute is accessed, but __getattr__ on Dataset and DataArray is already complicated enough. So not sure whether the removed line is worth the effort.

For now, I'm simply adding noqa: F401 comments to the import, which makes sure tools like ruff don't auto-remove it.

LecrisUT commented 1 month ago

You could imagine loading the entrypoint library whenever this particular attribute is accessed, but __getattr__ on Dataset and DataArray is already complicated enough

I was considering a different interface: whenever the module is loaded (e.g. xarray) loop through the modules and load the entrypoint. E.g. define a function to load entrypoints (example) and then run early when the relevant module is loaded (example). The entry-point just points to the package/module file and effectively it just does the import.

keewis commented 1 month ago

that could work, with the downside that now the import time has increased simply by the presence of the library. Given that people have repeatedly complained about long import times (with pint also being pretty slow), I don't think this would be accepted.

TomNicholas commented 1 month ago

If I understand this correctly it basically involves the new entry point silently running completely arbitrary code at import time. This doesn't seem like a good idea to me.

Our existing entry points in Xarray plug into some well-defined interface, and only run in the context of some specific ABC. What you're suggesting here seems a lot more general and prone to abuse.

dopplershift commented 1 month ago

What about making the DatasetAccessor and DataArrayAccessor subclasses expose as entry points and avoid the need for the @xr.register_dataset_accessor decorator? I agree, I've always found it a little weird that I need to do an import of one library, just to then be able to do:

import mylibrary
nc = xr.open_dataset('foo.nc')
nc.mylibrary.myfunc()

It's not so much saving a 1-line import as it is avoiding the oddity that you need to do an import but then avoid using the thing you imported directly.

TomNicholas commented 1 month ago

Rewriting the accessors to use entrypoints instead is an interesting idea... I'm still not quite sure I understand what this would look like but perhaps @dopplershift you could raise this upstream in Xarray for further discussion?

LecrisUT commented 1 month ago

Our existing entry points in Xarray plug into some well-defined interface, and only run in the context of some specific ABC. What you're suggesting here seems a lot more general and prone to abuse.

Abuse wise it is equivalent if it points to a module or an attribute since the same import command is executed regardless. The only difference is if it should be automatically loaded at import or disable that import and control on xarray how the extensions are registered.

But in the end it does not matter as long as some process of automatically load the extensions is in place. Sure the import would be affected on all import xarray calls, but if the user installed the packages, don't they want it always loaded? At least with entry-points the import can be done later on, and you have control, e.g. disabling one/all plugins via env variable or by altering a global variable.