python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.55k stars 2.84k forks source link

Non-empty container types #9745

Open AustinScola opened 4 years ago

AustinScola commented 4 years ago

Feature

It would be very helpful to have a way of specifying that a container type must be non-empty.

Pitch

As an example of why this feature would be helpful, consider the following function, average:

from typing import List

def average(numbers: List[float]) -> float:
    """Return the average of a non-empty list of numbers."""
    return sum(numbers) / len(numbers)

Rather than relying on exceptions for handling the case when the number of numbers is zero, is seems like something along the lines of the following would be more convenient and more explicit:

from typing import List, NonEmpty

def average(numbers: NonEmpty[List[float]]) -> float:
    """Return the average of a non-empty list of numbers."""
    return sum(numbers) / len(numbers)

I don't know anything about the way mypy is implemented so I'm not sure if this is a possible.

harahu commented 3 years ago

This right here!

Came across this post, and wanted to see if I could do something similar using python`s types, without having to subclass, or similar.

JelleZijlstra commented 3 years ago

This would be very complicated to implement in full generality. Consider this code:

def f(lst: NonEmpty[list[float]]) -> None:
    lst.pop()
    average(lst) # is it still nonempty?
harahu commented 3 years ago

@JelleZijlstra I see your point. Maybe this shouldn't be allowed for a mutable container. But it still makes sense for, say a str, or what do you think?

EDIT: Realizing tuple was a bad example.

JelleZijlstra commented 3 years ago

An immutable container is easier, but it would still be a huge project. If you do str.strip(), does that give you another nonempty string?

harahu commented 3 years ago

An immutable container is easier, but it would still be a huge project. If you do str.strip(), does that give you another nonempty string?

This is not too different a case from how typing.NewType works today.

# 'output' is of type 'int', not 'UserId'`
`output = UserId(23413) + UserId(54341)

You should be pretty safe by letting the return type be the superclass, I'd imagine.

harahu commented 3 years ago

@JelleZijlstra:

Having slept on it, I realize this has a lot more in common with NewType than I first though. In PEP484, the motivating for said type is the following (I`m quoting):

There are also situations where a programmer might want to avoid logical errors by creating simple classes. For example:

class UserId(int):
    pass

get_by_user_id(user_id: UserId):
    ...

This is quite close to what NonEmpty could look like, if breaking slightly from the pattern in the OP. Imagine a class NonEmptyStr(str). What would you want to change in such a subclass? Two things, essentially:

The first point is perhaps the most important, and is the main reason I don't reach for NewType more often today.

Generalizing this pattern, we see it pop up in other settings too: PositiveInt(int), Prime(PositiveInt), and so on.

Except for the need for validation, these are integers, and should behave as such in all cases. If I could annotate functions as needing a strict subset of ints, in a way that provides a runtime guarantee, I'd be all over it, provided it doesn't cost me performance outside of the validation.

harahu commented 3 years ago

Not sure there is a term for this kind of subclassing. Often, when doing subclassing, one does so in order to equip the sub-class with a more powerful set of functionality. In this case we are doing sub-classing without adding more functionality, but rather to provide stricter guarantees for method and function types. Do you know if anybody makes such a distinction within type theory?

I'm realising that what I might want is a NewType equipped with validation, and tooling for overloading method singatures. At runtime, NonEmptyStr(some_str) could apply the validation function, without actually modifying the some_str object, such that you still have your plain old str, which means this is closer to a zero-cost abstraction. While at static analysis time, the type checker can safely treat this (actually a str) as of the narrower NonEmptyStr type.

Another perspective on this, is providing a concept similar to Literal, but usable for larger, finite, subsets and infinite subsets of a type. I mean in the sense that the type Literal["foo"] is a strictly analysis time subtype of str, and more useful than NewType because some checking is being done that you aren't making invalid claims, like foo: Literal["foo"] = "bar". What I want is being able to do the same for subsets I can't type out by hand. NonEmptyStr is just a special case of CompliantWithRegEx, in this sense.

I guess what I'm wondering is whether or not any of these ideas/perspectives provide a path to some or all of this functionality that isn't crazy costly to implement. If some of this does not sound like the ramblings of a madman, and you're kind of seeing where I'm going with this, I'd love some feedback on the feasibility of it, @JelleZijlstra. Do I need to go back to the drawing board?

JelleZijlstra commented 3 years ago

I can definitely see the use case for this! A related problem I'm thinking about is int types. In our codebase we often use ints to represent timestamps. We use a NewType for that (Utime), but if you do math on those (my_timestamp + 3 * DAY), the typechecker will just infer an int. A way to say that Utime + Utime = Utime would understand the program more precisely and catch issues where we confuse the Utime with another kind of int. I don't know what that would look like, though.

harahu commented 3 years ago

A related problem I'm thinking about is int types.

Highly relevant for us as well. Often want to annotate something as a StrictlyPositiveInt, and so on. Then, StrictlyPositiveInt + StrictlyPositiveInt -> StrictlyPositiveInt makes sense in the same way.

Ada has a related concept: https://learn.adacore.com/courses/intro-to-ada/chapters/strongly_typed_language.html#integers

Not sure I know exactly how this would be implemented either, but I think there is potential here for something that is both useful and a lot easier to implement than the originally proposed NonEmpty type.

emmatyping commented 3 years ago

It seems that some form of specialization is required to express this. I could imagine some unified NewType and Protocol that specializes a type could be used to express these ideas.

harahu commented 3 years ago

It seems that some form of specialization is required to express this. I could imagine some unified NewType and Protocol that specializes a type could be used to express these ideas.

I think you are on to something there! Would probably have to be a narrower kind of Protocol with no defaults or implementation allowed.

Just to mix in another already existing type: I imagine the runtime check part I'm missing with NewType could be achieved by providing a mechanism for optionally pairing your NewType with a TypeGuard that would result in a TypeError when trying NonEmptyStr(""), for instance.

cdce8p commented 1 year ago

A NotEmpty type could be quite helpful to reduce the number of false-positives for the new possibly-undefined check and the pyright possibly unbound check.

/CC: @ilinum

def func(l: list[int]) -> None:
    for item in l:
        var = item + 4

    print(var)  # Name "var" may be undefined  [possibly-undefined]

This would be very complicated to implement in full generality. Consider this code:

def f(lst: NonEmpty[list[float]]) -> None:
    lst.pop()
    average(lst) # is it still nonempty?

I'm not sure it really matters. We could apply a strict interpretation where a container is only marked as NotEmpty if we know it contains at least one element. That might not cover all cases, but those could still use cast.

I.e.

def func(lst: NotEmpty[list[int]], lst2: list[int]) -> None:
    lst.pop()  # list[int]
    lst + lst2  # list[int]

    lst.append(2)  # NotEmpty[list[int]]
    lst + lst2  # NotEmpty[list[int]]
cdce8p commented 1 year ago

/CC: @erictraut I think this could also help pyright.

intgr commented 1 year ago

I just experienced a bug that NonEmpty[str] support could have avoided. Simplified example:

def stringify(value: int | None) -> str:
    return str(value) or ""

int.__str__ and NoneType.__str__ can never return an empty string. If they were typed as returning NonEmpty[str], then mypy could deduce that the or "" part of the conditional is never executed (dead code) and raise an error.

Because what was actually intended was str(value or "")

ikonst commented 1 year ago

Another category of bugs this could help is arithmetics between naive and tz-aware datetimes.

ikonst commented 1 year ago

I've tried modifying datetime.pyi to add this behavior and quickly ran into the limitation of NewType requiring a concrete class, whereas datetime (as of Python 3.8) tries to preserve subclasses, so while I can do

+NaiveDatetime = NewType('NaiveDatetime', datetime)
+TzAwareDatetime = NewType('TzAwareDatetime', datetime)

 class datetime(date):
     ...
-    def replace(self: Self, ..., tz_info=_TzInfo | None = ..., ...) -> Self: ...
+    @overload
+    def replace(self, ..., tz_info=_TzInfo, ...) -> TzAwareDatetime: ...
+    @overload
+    def replace(self, ..., tz_info=None, ...) -> NaiveDatetime: ...

this would be a regression in subclass-preservation.

Back when PEP-593's Annotated was introduced, I thought it would enable us to overload-match on arbitrary traits (e.g. "tz-aware", "non-empty")

@overload
def replace(self: Self, ..., tz_info=_TzInfo, ...) -> Annotated[Self, TzAware]: ...

but at least the way the PEP intended and the way mypy implemented it, annotations appear to be stripped before any checking occurs.

pjonsson commented 1 year ago

@JelleZijlstra I realize I'm 2 years late, but are you aware of units-of-measure in F# (https://learn.microsoft.com/en-us/dotnet/fsharp/language-reference/units-of-measure)? "Types for Units-of-Measure: Theory and Practice" from Central European Functional Programming School 2009 might provide some ideas for how to implement that.

dogweather commented 1 year ago

I have a simple NonemptyString that I also use as a newtype base class. Eg, URL. https://github.com/public-law/open-gov-crawlers/blob/master/public_law/text.py

In Ruby, I have a nonempty Array which doesn't need to throw an exception: it initializes with a list head and tail. https://github.com/dogweather/non_empty_array

I believe that this general area is called Dependent Types. Lots of info in that topic.