range typing - Githubissues

python / typing

Python static typing home. Hosts the documentation and a user help forum.

https://typing.readthedocs.io/

Other

1.57k stars 230 forks source link

range typing #1304

Open Godwhitelight opened 1 year ago

Godwhitelight commented 1 year ago

I suggest you add Range, Min, and Max typing to check if a number is inside a range or something like that.

def example(range_example: Range(int, 1, 5), min_example: Min(int, 5), max_example: Max(float, 5.4)):
    pass

That would be useful and really nice to have.

zmievsa commented 1 year ago

Essentially you wish to create 3 new types in that one line. Are there any other languages that support such operations? You can already implement such functionality with NewType and a single type constructor per type but it's going to be a lot more code than you have in your example.

gvanrossum commented 1 year ago

I presume this is inspired by languages like Ada and Pascal, which allow integer ranges as types.

The problem is that usually this requires inserting dynamic checks into the code, e.g. if we have

x: Range(int, 1, 10) = 5
for i in range(n):
    x = x + 1

we can't really say much about whether x is still in range at the end unless we aggressively check the range upon each assignment.

I don't think we should do this, because Python's static type checkers (e.g. mypy) don't have the ability to insert range checks (as they only check the code and don't compile it).

Changaco commented 1 year ago

Instead of creating new classes as @Godwhitelight suggested, I propose adding __class_getitem__ methods to the existing numerical types int, float and Decimal.

Here are some basic examples:

int[-1, 2] would indicate that the only valid values are -1, 0, 1 and 2
float[0, 1] would indicate that any floating-point number between 0 and 1 (inclusive) is acceptable
Decimal[1, ...] would indicate that any real number greater than or equal to 1 is acceptable

The problem with this is that it doesn't handle exclusive ranges. The best way I can think of to support them is to use strings to describe the range, for example float['>0 and <1']. However, this would add complexity and isn't obvious.

I don't think the inability of static type checkers to handle all cases should block the implementation of this feature. It seems to me that the worst that can happen is developers having to make changes to type annotations to silence the type checker.

zmievsa commented 1 year ago

@Changaco Even though your proposal has a more interesting notation, I do not see how it solves the issue that Guido has mentioned. I think that before hardcoding such things into the language, we should first expand the type system to allow for such complex types.

I.e. Type intersections, higher kinded types, etc. Because once we have those, the range thing might even be possible without hardcoding it into the language.

jp-larose commented 1 year ago

I agree, this would be nearly impossible to check statically. Static checks would be limited to literals, constants, and perhaps a few more elaborate cases. However, part of the point of type hints is for in-code documentation, so this idea has merit on this point alone.

A type checker for range types as suggested here could produce 3 results:

definite pass (e.g. the literal assigned is within range)
definite fail (e.g. the literal assigned is out of range)
indeterminate (e.g. the value is calculated, or comes from user input)

Static checkers (such as mypy) could have options on how to treat indeterminate values in ranges: pass (ignore), warn, fail.

A tool/framework like pydantic could use this information to do runtime checks on the values. Saves the boilerplate validation code.

Having written all of this... I can see how implementing this could get really complicated really fast. I like the idea, I don't think it should be dismissed because it's difficult. Maybe not a high priority, but still worth considering.

I like @Changaco's suggestion for notation, BTW. Another possibility could be to use the slice notation. I think that would feel more natural. Your examples could then be written as:

int[-1:2]
float[0:1]
Decimal[1:] Not quite sure how the third component of slice would be interpreted. It might be class dependent. It might retain its "step" meaning. Or it might just not be allowed. Worth exploring.

zmievsa commented 1 year ago

Pydantc is already capable of range checking so the code the author requested is achievable with:

from pydantic import Field, validate_arguments

@validate_arguments
def example(range_example: int = Field(gt=1, lt=5), min_example: int = Field(ge=5), max_example: float = Field(le=5.4)):
    pass

zmievsa commented 1 year ago

@jp-larose to achieve that syntax, you can use pydantic and tips from this article: https://tinkering.xyz/abusing-type-annotations/

The author of this article has created a special range checker too.

SamuelMarks commented 10 months ago

Also interested in this!

Here's an issue I posted yesterday:

Feature

Constrained types:
# Normal integer
# Bring `MAX_INT` back or `2 ** (struct.Struct('i').size * 8 - 1) - 2`
C_Int = int[MIN_INT … MAX_INT]
# Normal floating pointer number
C_Float = float[MIN_FLOAT … MAX_FLOAT]
# Normal unsigned 64 bit integer
C_UInt64 = int[UINT64_MIN … UINT64_MAX]
# Positive signed; goes to MAX_INT
Pos = C_Int[0 …]
# Even integer number
PosEven = Pos[0, 2, 4, …]
# Odd integer number
PosOdd = Pos[1, 3, 5, …]
# Fibonacci sequence (ok maybe no need to go this far!!!)
Fib = uint[0, 1, 1, 2, 3, 5, 8, 13, …]
Pitch

In Fortran / ASN.1 and bunch of other languages (e.g., SQL) one can specify what range of numbers are permissible:
INTEGER(0..199)
Similarly for text:
CHARACTER(LEN=20) :: word
VARCHAR(20)
Related work: https://docs.pydantic.dev/latest/usage/types/number_types/#constrained-types

It would be really handy when defining database column types in Python. It would also be handy when defining things in numpy / torch / tensorflow / jax; i.e., when types are being explicitly called of the np.int32 style. I.e., this kind of code should work:
# Pretend we don't need `INT16_MAX = np.iinfo(np.int16).max`
good_sized: int16 = INT16_MAX
# error
pretty_small: int8 = good_sized
Now obviously it won't work in all cases, but before runtime one should be able to trace quite a lot and figure out where incompatible types are being passed along.

Relationship between types could also be made, like the old C promotion hierarchy: https://en.cppreference.com/w/c/language/conversion#Integer_promotions

Anyway, I think a syntax like this could stop the whole import world as np; import notnumpy as np thing. Numpy was going to be part of CPython anyway!

zmievsa commented 10 months ago

Let's go through the problems you are trying to solve.

defining database column types in Python

Which ORM are we talking about? Without ORM, you would just define these things as strings in pure SQL. And any ORM has full power to implement such syntax using a few custom typehints.

Relationship between types could also be made, like the old C promotion hierarchy

Python does not have integer promotions. Python types are kept intentionally simple. We are not C, and it's good! Because python is made for a different purpose: for prototyping, for gluing lower-level languages, for building large things quickly.

I think a syntax like this could stop the whole import world as np; import notnumpy as np

I feel like if you are using numpy, then it makes a lot of sense to import numpy. I am not a data scienting or a data engineer so I am out of context here. I tried to google these imports and found no mentions of them anywhere. Are you sure it's an actual problem?

Numpy was going to be part of CPython anyway

I don't think it was ever the case. Pushing libraries into CPython is not a great idea -- it is good for them to be separate. For example, because library authors might not want to match Python's slow release schedule. Or library authors would like to introduce a language like rust or go into their code base. Or they would like to do major breaking changes at some point. None of these changes are possible once a library becomes a part of CPython. Besides adding a library to CPython is hard, especially a large one like numpy. But you are talking about adding a bunch of globals (like int32) from there too. Adding globals is even harder. Imagine someone who does fastapi or django 100% of their time. Will they ever need numpy? Probably not :)

SamuelMarks commented 10 months ago

ORM

Yes, but there is no standardisation here. With type hints, these layers are starting to coalesce on that syntax. But without standard more-specific types, the syntax becomes custom [a DSL] at this point.

We are not C, and it's good!

Plenty of implementations are trying to push Python into a faster direction, including newer versions of cpython, Mojo, LPython, &etc.

My thinking is that if the language has the right primitives, then different runtimes can optimise without introducing their own DSL [e.g., cython, Mojo].

So rather than cython.int, ctypes.c_int, np.intc, tf.experimental.numpy.int32; you could have an actual official type annotation for a C integer.

This would open up optimisation opportunities, and it would also improve compatibility (for FFI and networking purposes).

I feel like if you are using numpy, then it makes a lot of sense to import numpy.

Not really. A non-exhaustive list of examples:

import tensorflow.experimental.numpy as np
import numpy as np
import dask.array as np
import jax.numpy as np
import cupy as np

Imagine someone who does fastapi or django 100% of their time. Will they ever need numpy? Probably not :)

Not so sure about that. Both allow you to define structures, which often go all the way down to the user. Form validation might include restrictions on text length; database columns can have restriction on integer precision; &etc.

zmievsa commented 10 months ago

but there is no standardisation here

I am afraid it's a super rare case when CPython gets some syntax strictly for other libraries. Yes, there's https://peps.python.org/pep-0646/, but that one is useful everywhere in python. Variadic generics are a useful feature even without touching any libraries. Besides: without variadic generics, it was impossible to express some ideas in Python. It's possible to express integer range types as int or as a custom class Range that inherits from int, for example. From types standpoint, they are going to be equivalent. Yes, python will not validate that values actually fall into the range but python never does -- not for any type. There is no precedent to such checking, I think. In Python, type hints are supposed to be just that -- hints. Not affecting actual behavior in any way unless you are doing some magic yourself (e.g. Pydantic). It's not my opinion, it's something core devs have agreed upon when designing type hints and I believe that it's still the generally accepted opinion.

Plenty of implementations are trying to push Python into a faster direction

Speeding python up and adding lower level primitives are two strictly different tasks. When we speed it up, the user and the global scope are unaffected. When you add stuff, you clutter a language. We already have too much stuff going on in Python. I don't think anybody wants to go the direction of C++ where the language has so much stuff that it can overwhelm even experienced devs.

So rather than cython.int, ctypes.c_int, np.intc, tf.experimental.numpy.int32; you could have an actual official type annotation for a C integer

There is an official type annotation for a C integer: It's ctypes.c_int. So the question "Why doesn't numpy use it?" should be targeted at numpy devs, not CPython devs :)

Form validation might include restrictions on text length

It already does. No need for any builtin special syntax.

database columns can have restriction on integer precision

They already do in every ORM as far as I know :) Such standardization should be discussed with ORM-authors, not CPython devs. For example, in SQLAlchemy they won't use our integer types because they have their custom classes for each db type, not just for integers. Same situation in django and tortoise orms.

Which ORM would you like to use this special integer type?