Open AustinScola opened 4 years ago
This right here!
Came across this post, and wanted to see if I could do something similar using python`s types, without having to subclass, or similar.
This would be very complicated to implement in full generality. Consider this code:
def f(lst: NonEmpty[list[float]]) -> None:
lst.pop()
average(lst) # is it still nonempty?
@JelleZijlstra I see your point. Maybe this shouldn't be allowed for a mutable container. But it still makes sense for, say a str
, or what do you think?
EDIT: Realizing tuple
was a bad example.
An immutable container is easier, but it would still be a huge project. If you do str.strip()
, does that give you another nonempty string?
An immutable container is easier, but it would still be a huge project. If you do
str.strip()
, does that give you another nonempty string?
This is not too different a case from how typing.NewType
works today.
# 'output' is of type 'int', not 'UserId'` `output = UserId(23413) + UserId(54341)
You should be pretty safe by letting the return type be the superclass, I'd imagine.
@JelleZijlstra:
Having slept on it, I realize this has a lot more in common with NewType
than I first though. In PEP484, the motivating for said type is the following (I`m quoting):
There are also situations where a programmer might want to avoid logical errors by creating simple classes. For example:
class UserId(int):
pass
get_by_user_id(user_id: UserId):
...
This is quite close to what NonEmpty
could look like, if breaking slightly from the pattern in the OP. Imagine a class NonEmptyStr(str)
. What would you want to change in such a subclass? Two things, essentially:
NonEmptyStr("")
, which should give greater run time safety than NewType
, provided the object is immutable.NonEmptyStr("foo").title()
could be annotated to return a NonEmptyStr
, for instance, but if it isn't a default to just str
(inherited) is still safe.The first point is perhaps the most important, and is the main reason I don't reach for NewType
more often today.
Generalizing this pattern, we see it pop up in other settings too:
PositiveInt(int)
, Prime(PositiveInt)
, and so on.
Except for the need for validation, these are integers, and should behave as such in all cases. If I could annotate functions as needing a strict subset of ints, in a way that provides a runtime guarantee, I'd be all over it, provided it doesn't cost me performance outside of the validation.
Not sure there is a term for this kind of subclassing. Often, when doing subclassing, one does so in order to equip the sub-class with a more powerful set of functionality. In this case we are doing sub-classing without adding more functionality, but rather to provide stricter guarantees for method and function types. Do you know if anybody makes such a distinction within type theory?
I'm realising that what I might want is a NewType
equipped with validation, and tooling for overloading method singatures. At runtime, NonEmptyStr(some_str)
could apply the validation function, without actually modifying the some_str
object, such that you still have your plain old str
, which means this is closer to a zero-cost abstraction. While at static analysis time, the type checker can safely treat this (actually a str
) as of the narrower NonEmptyStr
type.
Another perspective on this, is providing a concept similar to Literal
, but usable for larger, finite, subsets and infinite subsets of a type. I mean in the sense that the type Literal["foo"]
is a strictly analysis time subtype of str
, and more useful than NewType
because some checking is being done that you aren't making invalid claims, like foo: Literal["foo"] = "bar"
. What I want is being able to do the same for subsets I can't type out by hand. NonEmptyStr
is just a special case of CompliantWithRegEx
, in this sense.
I guess what I'm wondering is whether or not any of these ideas/perspectives provide a path to some or all of this functionality that isn't crazy costly to implement. If some of this does not sound like the ramblings of a madman, and you're kind of seeing where I'm going with this, I'd love some feedback on the feasibility of it, @JelleZijlstra. Do I need to go back to the drawing board?
I can definitely see the use case for this! A related problem I'm thinking about is int types. In our codebase we often use ints to represent timestamps. We use a NewType for that (Utime), but if you do math on those (my_timestamp + 3 * DAY), the typechecker will just infer an int. A way to say that Utime + Utime = Utime would understand the program more precisely and catch issues where we confuse the Utime with another kind of int. I don't know what that would look like, though.
A related problem I'm thinking about is int types.
Highly relevant for us as well. Often want to annotate something as a StrictlyPositiveInt
, and so on. Then, StrictlyPositiveInt + StrictlyPositiveInt -> StrictlyPositiveInt
makes sense in the same way.
Ada has a related concept: https://learn.adacore.com/courses/intro-to-ada/chapters/strongly_typed_language.html#integers
Not sure I know exactly how this would be implemented either, but I think there is potential here for something that is both useful and a lot easier to implement than the originally proposed NonEmpty
type.
It seems that some form of specialization is required to express this. I could imagine some unified NewType
and Protocol
that specializes a type could be used to express these ideas.
It seems that some form of specialization is required to express this. I could imagine some unified
NewType
andProtocol
that specializes a type could be used to express these ideas.
I think you are on to something there! Would probably have to be a narrower kind of Protocol
with no defaults or implementation allowed.
Just to mix in another already existing type: I imagine the runtime check part I'm missing with NewType
could be achieved by providing a mechanism for optionally pairing your NewType
with a TypeGuard
that would result in a TypeError
when trying NonEmptyStr("")
, for instance.
A NotEmpty
type could be quite helpful to reduce the number of false-positives for the new possibly-undefined
check and the pyright possibly unbound
check.
/CC: @ilinum
def func(l: list[int]) -> None:
for item in l:
var = item + 4
print(var) # Name "var" may be undefined [possibly-undefined]
This would be very complicated to implement in full generality. Consider this code:
def f(lst: NonEmpty[list[float]]) -> None: lst.pop() average(lst) # is it still nonempty?
I'm not sure it really matters. We could apply a strict interpretation where a container is only marked as NotEmpty
if we know it contains at least
one element. That might not cover all cases, but those could still use cast
.
I.e.
def func(lst: NotEmpty[list[int]], lst2: list[int]) -> None:
lst.pop() # list[int]
lst + lst2 # list[int]
lst.append(2) # NotEmpty[list[int]]
lst + lst2 # NotEmpty[list[int]]
/CC: @erictraut I think this could also help pyright
.
I just experienced a bug that NonEmpty[str]
support could have avoided. Simplified example:
def stringify(value: int | None) -> str:
return str(value) or ""
int.__str__
and NoneType.__str__
can never return an empty string. If they were typed as returning NonEmpty[str]
, then mypy could deduce that the or ""
part of the conditional is never executed (dead code) and raise an error.
Because what was actually intended was str(value or "")
Another category of bugs this could help is arithmetics between naive and tz-aware datetime
s.
I've tried modifying datetime.pyi to add this behavior and quickly ran into the limitation of NewType
requiring a concrete class, whereas datetime
(as of Python 3.8) tries to preserve subclasses, so while I can do
+NaiveDatetime = NewType('NaiveDatetime', datetime)
+TzAwareDatetime = NewType('TzAwareDatetime', datetime)
class datetime(date):
...
- def replace(self: Self, ..., tz_info=_TzInfo | None = ..., ...) -> Self: ...
+ @overload
+ def replace(self, ..., tz_info=_TzInfo, ...) -> TzAwareDatetime: ...
+ @overload
+ def replace(self, ..., tz_info=None, ...) -> NaiveDatetime: ...
this would be a regression in subclass-preservation.
Back when PEP-593's Annotated
was introduced, I thought it would enable us to overload-match on arbitrary traits (e.g. "tz-aware", "non-empty")
@overload
def replace(self: Self, ..., tz_info=_TzInfo, ...) -> Annotated[Self, TzAware]: ...
but at least the way the PEP intended and the way mypy implemented it, annotations appear to be stripped before any checking occurs.
@JelleZijlstra I realize I'm 2 years late, but are you aware of units-of-measure in F# (https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/units-of-measure)? "Types for Units-of-Measure: Theory and Practice" from Central European Functional Programming School 2009 might provide some ideas for how to implement that.
I have a simple NonemptyString that I also use as a newtype base class. Eg, URL. https://github.com/public-law/open-gov-crawlers/blob/master/public_law/text.py
In Ruby, I have a nonempty Array which doesn't need to throw an exception: it initializes with a list head and tail. https://github.com/dogweather/non_empty_array
I believe that this general area is called Dependent Types. Lots of info in that topic.
Feature
It would be very helpful to have a way of specifying that a container type must be non-empty.
Pitch
As an example of why this feature would be helpful, consider the following function,
average
:Rather than relying on exceptions for handling the case when the number of numbers is zero, is seems like something along the lines of the following would be more convenient and more explicit:
I don't know anything about the way mypy is implemented so I'm not sure if this is a possible.