Closed JWCS closed 1 month ago
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! If you have an idea for a solution, we would really welcome a Pull Request with proposed changes. See the Contributing Guide for more. It may take us a while to respond here, but we really value your contribution. Contributors like you help make xarray better. Thank you!
Do you have suggestions for how to avoid this? Happy to take a PR which removes some of these if it the proposed code still works. Generally this is done to avoid circular or optional imports. (I don't think "spurious" is an accurate description)
(note #9561 as a case for more of this...)
Edit:
`Mapping[Hashable, "xarray.core.dataarray.DataArray"], # <-- this is all @beartype asks`
OK, I don't think we have any objection to this, would be good to see a PR with a couple of examples.
I looked at https://github.com/beartype/beartype/issues/456, seems like Beartype's normative views are stronger than I expected. Is there any direction from the python typing community on using TYPE_CHECKING
— I think we would update our approach if the mypy folks said it was bad practice.
To get ahead of it: I'm less looking to get into an object-level discussion on the merits, just pointing out that my mental model for whether TYPE_CHECKING
is reasonable seems quite different to yours + asking whether there are objective sources.
...heh. @leycec has been summoned. Thanks so much to @JWCS for getting this typing
ball rolling. You rock! :rock:
TYPE_CHECKING
: After All These Years, Still You Plague @beartypeThis is an incredibly fun discussion from my perspective – but probably an incredibly tedious and non-fun discussion from everybody else's. I acknowledge this. I shall aim for concision and fail. :face_exhaling:
Is there any direction from the python typing community on using
TYPE_CHECKING
— I think we would update our approach if the mypy folks said it was bad practice.
It's not so much that TYPE_CHECKING
is unilaterally bad per say; it's more that TYPE_CHECKING
is really hard to get right under real-world conditions and constraints. Since it's hard to get right, it's easy to get wrong. More formally, TYPE_CHECKING
promotes fragility at runtime – especially desynchronization between static type-checkers like mypy
and pyright
on the one hand and hybrid runtime-static type-checkers like @beartype and typeguard
on the other.
TYPE_CHECKING
is thus the Python equivalent of Dijkstra's infamous "Go To Considered Harmful". Like TYPE_CHECKING
, goto
statements are hard to get right. @JWCS was omniscient when he chose this issue title.
mypy
Is Right About Everything for Once!?The way to get TYPE_CHECKING
right is basically to:
Specifically, this is what mypy
advises to resolve import cycles and circular imports:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
import bar
def listify(arg: 'bar.BarClass') -> 'list[bar.BarClass]': # <-- absolute forward references for victory!
return [arg]
That's almost what xarray
is already doing – but not quite. xarray
currently prefers relative to absolute forward references. The xarray
equivalent of the above code would look something like:
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from bar import BarClass
def listify(arg: 'BarClass') -> 'list[BarClass]': # <-- relative forward references for great loss!
return [arg]
Unfortunately for xarray
, your seemingly sensible change to the stock mypy
advice renders the resulting type hints unresolvable at runtime. Why? Because TYPE_CHECKING
is False
at runtime, which is when type-checkers like @beartype and typeguard
run. The xarray
equivalent of the above code thus reduces at runtime to simply:
from typing import TYPE_CHECKING
def listify(arg: 'BarClass') -> 'list[BarClass]': # <-- relative forward references for great loss!
return [arg]
We see the issue. BarClass
is a relative forward reference. By definition, relative forward references are relative to the current submodule. @beartype thus expects to find BarClass
in the current submodule. But BarClass
doesn't live in the current submodule! BarClass
lives in the external bar
module. @beartype no longer knows where BarClass
lives, because all contextual metadata required to find out where BarClass
lives has already been destroyed by the Python interpreter at runtime. :grimacing:
Shockingly, mypy
is actually right about everything for once. Absolute forward references are the way. They preserve the context that runtime type-checkers need to resolve circular imports.
Yeah. I wouldn't want to do that either. Clearly, refactoring all circular imports across your entire codebase from relative to absolute forward references is a non-trivial, tedious, and boring endeavour. I'd run from that too. :runner: :dash:
The alternative is incompatibility with runtime type-checkers. That's... not great either. Both options kinda suck, huh?
But fear not! From @beartype's wildly different perspective, this is sorta resolved. We've internally "blacklisted" both xarray
and Pydantic, which employ a similar if TYPE_CHECKING:
strategy throughout their codebases.
By "blacklisted," I mean that @beartype now merely shallowly type-checks just the types of objects that originate from xarray
and Pydantic. Previously, @beartype tried to deeply type-check the items in containers that originate from xarray
and Pydantic. For example, @beartype tried to type-check that xarray.Dataset
containers map from Hashable
objects to xarray.DataArray
containers. Oh, well. Can't do that anymore. </sigh>
It could be worse. @beartype now officially "supports" xarray
and Pydantic by mostly ignoring everything from xarray
and Pydantic. We'd looooove to fully support both xarray
and Pydantic, though. If you feel like tackling this in a year or two, just ping us when the resolution lands and we'll immediately "unblacklist" xarray
in @beartype.
Yeah. Me too. Let's pass out, everybody. Open-source volunteerism is a done thing tonight. :sleeping:
OK — keeping this brief — do you see a reasonable way to convert the imports to work? We don't want xarray.core.dataarray.DataArray
every time we use a DataArray
, but we're probably OK with that in the type definitions...
I have the same question as @max-sixty.
How would you actually replace
if TYPE_CHECKING:
...
from xarray.core.dataarray import DataArray
def x(da: DataArray) ...
Do you really have to do:
if TYPE_CHECKING:
...
import xarray
def x(da: xarray.core.dataarray.DataArray) ...
?
Edit: maybe this is actually ok to do?:
if TYPE_CHECKING:
...
import xarray
def x(da: xarray.DataArray) ...
That would not be too bad though.
Surely that last suggestion wouldn't work either, for the same reason? When TYPE_CHECKING
is False
, xarray
doesn't get imported, so the reference won't be resolvable by the runtime type checker.
I have to say this seems like the sort of thing that should be resolved at the level of linter conventions and standards rather than by going around to individual codebases...
One idea — but maybe beartype have already considered this — is for beartype to do some heuristics. So for example, DataArray
is exposed at the root of the package xarray.DataArray
, so that could be a reasonable place to check.
(Yes agree not sure we're going to solve it all here)
I think from xarray's perspective, we are happy to make changes which are:
Having xarray.core.dataarray.DataArray
in type signatures would meet (1) & (2). Having xarray.core.dataarray.DataArray
in every reference likely wouldn't. And we'd need some help on (3).
Assuming we can meet those, we're happy to make the change and do our bit for the ecosystem. We're supporters-in-spirit of projects like beartype!
Is that reasonable?
...heh. @headtr1ck has the right of it on both accounts. That's an @airbus scientist for you, huh? What won't those guys come up with next!? Basically, you have two sane choices:
This suffices for @beartype. This should suffice for everyone else, too:
def x(da: 'xarray.DataArray'): ... # <-- @beartype loves this
If you hate quotes, ...no shame this is also totally fine:
from __future__ import annotations
def x(da: xarray.DataArray): ... # <-- @beartype even loves this
Note the distinct lack of TYPE_CHECKING
anywhere. If you really want to use TYPE_CHECKING
, you can. No worries. "You do you," as Gen Z says. But there's not much point in doing so, in this case. TYPE_CHECKING
is usually a last resort when mypy
or pyright
just won't shut up about something – not logic you necessarily throw around in every submodule.
If you hate reiterating xarray.core.dataarray.DataArray
everywhere, just prefer the short-form xarray.DataArray
everywhere like above. That's fine. All runtime type-checkers (including @beartype) know how to dynamically import attributes from third-party modules. You don't need to tell runtime type-checkers how to import stuff. Trivial, right?
In fact, I'm unclear why you even need to tell static type-checkers how to import attributes. What's the point of redundant boilerplate like this, anyway?
if TYPE_CHECKING:
import xarray
No idea, honestly. xarray.DataArray
is already enough metadata to unambiguously resolve that forward reference. Importing xarray
on top of that is just meaningless fluff, really. You shouldn't need TYPE_CHECKING
at all to declare forward references. I shrug non-committally. </shrug>
If you hate even xarray.DataArray
, define a new private submodule – say, xarray._forwardrefs
– that collects all of your forward references together. Then import and reuse them throughout your codebase as type hints: e.g.,
# In "xarray._forwardrefs":
DataArray = 'xarray.DataArray'
# In every other "xarray" submodule:
from xarray._forwardrefs import DataArray
def x(da: DataArray): ... # <-- @beartype actually even loves this
Literally anything other than what you're currently doing is something that @beartype loves. :smiling_face_with_tear:
One idea — but maybe @beartype have already considered this — is for @beartype to do some heuristics.
...heh. Heuristics, huh? So, hacks. Actually, I usually adore hacks. @beartype is one giant hack resting on top of a pile of kludges cobbled together with vicious one-liners.
So for example,
DataArray
is exposed at the root of the packagexarray.DataArray
, so that could be a reasonable place to check.
Sure. In this specific case, ad-hoc heuristics like that definitely work. In the general case, they... don't. Runtime type-checking is really all about definitive knowledge. "Intuitively guessing" where ambiguous types reside inevitably breaks down in common edge cases, invites non-portable fragility, and basically destroys the Python ecosystem.
@beartype has been integrated into PyTorch. You know? I used to be cavalier about this sort of thing. Then I repeatedly broke PyTorch. Microsoft was understandably unhappy. Now, I exercise caution in all things. I grit my teeth. :grimacing:
Explicit is better than implicit. Encouraging ambiguity just makes other people's code blow up.
Given that the plan to close
label was just applied, it's unclear whether this is worth additional effort. I kinda assumed going in that this was a lost cause. Predictably, it's distinctly smelling like a lost cause. I hate wasting everyone's time – especially my own.
If "your type hints are busted at runtime, so your package fundamentally violates CPython semantics including upcoming standards like PEP 649" isn't a compelling argument, it's probably best I bow out and just hack on open @beartype issues.
I sigh. I sigh and collapse into bed. :face_exhaling:
This suffices for @beartype. This should suffice for everyone else, too:
def x(da: 'xarray.DataArray'): ... # <-- @beartype loves this
Note the distinct lack of TYPE_CHECKING anywhere.
The reason we have the if TYPE_CHECKING
is because of circular imports — does this solution get around that? When I try the change locally, it raises errors.
Given that the
plan to close
label was just applied, it's unclear whether this is worth additional effort.
FWIW I'm marking it plan-to-close because I'm not sure we're going to make progress on getting something that meets the three criteria above, and I also don't want to lose anyone's time, especially yours. If there is a way of making progress on those three criteria then let's make a plan and fix it. Do you think there is a way of meeting those criteria?
Is that reasonable?
By now I didn't have time to play around with that.
But mainly the issue is that for us xarray
is not a third party package, we are xarray
.
So we cannot simply import xarray
.
The proposal of using forward refs as strings is not too much work to implement. Probably we can use our types
module and replace the if TYPE_CHECKING
imports with these forward refs.
But I agree that without some automated way of checking this it is not very useful.
Let's reopen when we find a way that meets the three criteria in https://github.com/pydata/xarray/issues/9581#issuecomment-2397604939.
@leycec we want to be good citizens in the python ecosystem, and we want to support projects like beartype — so please feel free to reopen / comment if there's some solution, thank you for commenting so far (& for building beartype!).
Also note that the TCH rules of ruff might not do the right thing, at least not by default:
Unused imports add a performance overhead at runtime, and risk creating import cycles. If an import is only used in typing-only contexts, it can instead be imported conditionally under an if
TYPE_CHECKING:
block to minimize runtime overhead.
@max-sixty Would adding pytest-beartype to xarray CI tests qualify as a kind of "automated way of checking"?
@max-sixty Would adding pytest-beartype to xarray CI tests qualify as a kind of "automated way of checking"?
Yes that would!
What happened?
This is a cross-post from the @beartype issue tracker, in which a downstream user tried to apply runtime type checking while using xarray... only to discover all the types, at runtime, were non-existent. The chief cause of this is spurious usage of
if TYPE_CHECKING
used throughout the xarray repo, which is undefined at runtime, like the types it provides. https://github.com/beartype/beartype/issues/456 The below bug report is merely copied from that post.The relevant issue (with respect to the MCVE below) seems to be the below example from the code base:
What did you expect to happen?
For the types to be visible at runtime, for runtime type checking.
From @leycec, the only real change needed should be:
Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
Anything else we need to know?
MVCE Versions beartype: 0.19.0 xarray: 2024.9.0 Python: 3.11.10
Environment
Omitted the environment, because not relevant.