Open anthrotype opened 6 years ago
Ok, as a continuation of previous discussion in #215... :)
@glyph I was trying to find https://github.com/aldanor/typo today but instead I ran across https://github.com/RussBaz/enforce which looks like it may be a fairly full-featured version of the same thing.
@chadrik There is also https://github.com/Stewori/pytypes May the best project win.
Just to clarify, the sole reason I've started one of my own was that all existing solutions (those including enforce and pytypes) were (a) slow and (b) wrong -- although both are very good attempts and good inspiration. That being said, my version is not fully type-correct either when it comes to sum types (see examples below), but 'less wrong' if I may; on the bright side, it's fast. I haven't spent any time on finishing it due to lack of motivation and time lately, but it could be done, maybe with some help. If anyone knows any other comparable or relevant projects - shout away, I'd personally be very interested.
TL;DR: it's hard to write a runtime type checker that's both fast and correct, especially if it aims to handle both typevars and sum types; although not impossible (I'm not sure one already exists at this moment however). Details below.
# pip install git+https://github.com/RussBaz/enforce.git
import enforce
# pip install git+https://github.com/aldanor/typo.git (3.5 only; needs a few fixes)
import typo
# pip install git+https://github.com/Stewori/pytypes.git
import pytypes
# pip install # git+https://github.com/agronholm/typeguard.git
import typeguard
Simple example:
def simple(x: int, y: str): pass
simple_pytypes = pytypes.typechecked(simple)
simple_enforce = enforce.runtime_validation(simple)
simple_typeguard = typeguard.typechecked(simple)
simple_typo = typo.type_check(simple)
args = 1, 'foo'
%timeit -r7 -n1000 simple_pytypes(*args)
%timeit -r7 -n1000 simple_enforce(*args)
%timeit -r7 -n1000 simple_typeguard(*args)
%timeit -r7 -n1000 simple_typo(*args)
314 µs ± 23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
155 µs ± 2.14 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
51.6 µs ± 13.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
497 ns ± 3.13 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
All three work correctly; typeguard is 100x slower, enforce 300x, pytypes 600x.
Slightly more involved:
from typing import List, Union, Dict, Tuple
def nested(x: Dict[Tuple[int, bytes], List[Union[str, float]]]) -> int: return 1
nested_pytypes = pytypes.typechecked(nested)
nested_enforce = enforce.runtime_validation(nested)
nested_typeguard = typeguard.typechecked(nested)
nested_typo = typo.type_check(nested)
x = {(1, b'3'): ['a', 1., 'b', 3.], (3, b'1'): [], (4, b'5'): ['c', 3.14]}
# %timeit -r7 -n1000 nested_pytypes(x) # FAILS
%timeit -r7 -n1000 nested_enforce(x)
%timeit -r7 -n1000 nested_typeguard(x)
%timeit -r7 -n1000 nested_typo(x)
2.08 ms ± 23.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
176 µs ± 1.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
4.57 µs ± 169 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Three out of four work -- pytypes fails, typeguard is 40x slower, enforce 450x.
Simple generic example (with a catch):
from typing import TypeVar
A, B = TypeVar('A'), TypeVar('B')
def generic1(x: List[Union[A, int]], y: A): pass
generic1_pytypes = pytypes.typechecked(generic1)
generic1_enforce = enforce.runtime_validation(generic1)
generic1_typeguard = typeguard.typechecked(generic1)
generic1_typo = typo.type_check(generic1)
args = [1], 'b' # valid signature (A=str)
# %timeit -r7 -n1000 generic1_pytypes(*args) # FAILS
# %timeit -r7 -n1000 generic1_enforce(*args) # FAILS
# %timeit -r7 -n1000 generic1_typeguard(*args) # FAILS
%timeit -r7 -n1000 generic1_typo(*args)
4.19 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
Three out of four fail.
And finally...
def generic2(x: Union[A, B], y: A, z: B): pass
generic2_pytypes = pytypes.typechecked(generic2)
generic2_enforce = enforce.runtime_validation(generic2)
generic2_typeguard = typeguard.typechecked(generic2)
generic2_typo = typo.type_check(generic2)
args = 'a', 3, 'b' # valid signature (A=int, B=str)
# %timeit -r7 -n1000 generic2_pytypes(*args) # FAILS
# %timeit -r7 -n1000 generic2_enforce(*args) # FAILS
# %timeit -r7 -n1000 generic2_typeguard(*args) # FAILS
# %timeit -r7 -n1000 generic2_typo(*args) # FAILS
All four fail, amen.
I couldn't help but to notice the absence of my library (typeguard) which predates both pytypes and typo (and which pytypes has borrowed much of its code from).
I just read through the scoping rules in PEP 484 and it certainly did not cover cases like this. How is a type checker supposed to bind the type variables when the first occurrence is in a Union
?
How is a type checker supposed to bind the type variables when the first occurrence is in a Union?
Not make final conclusions based on the first occurence?..
How, then, is it exactly supposed to make conclusions? This part was not explained in PEP 484.
@agronholm I couldn't help but to notice the absence of my library (typeguard) which predates both pytypes and typo (and which pytypes has borrowed much of its code from).
Apologies -- I now remember your library, it's actually the fastest of all three :)
I've added typeguard tests to the examples above.
How, then, is it exactly supposed to make conclusions? This part was not explained in PEP 484.
My intuition with signature like (Union[A, B], y: A, z: B)
and the input (str, int, str)
would be like this:
You could slightly optimize it by first resolving non-sum-types (although it will not magically help in all case; it can just make most of them faster):
This is kind of what typo tries to do, but there's still quite a bit of work; and there's some limitations.
If you resolve sum-types based on first occurence, this basically implies that Union[A, B] is not resolved the same way as Union[B, A] which doesn't make much sense.
So far I miss tests here that scope the case that type information in hosted in stubfiles, which are clearly and officially part of PEP 484 specification. Also, no tests involving OOP constructs - methods, classmethods, staticmethods and properties are shown, not yet speaking of inner classes. @aldanor I recommend to file issues for encountered failures in the respective projects. Only this way issues can be solved.
These tests seem to scope rather much on performance, which is for typechecking a secondary design goal at best. typechecking should be disabled outside of testing and debugging phase.
I'm pretty sure attrs can support this now, with no extra features. I'm willing to lend a hand to any author of a typechecking library to integrate into attrs.
The typechecker should be kept exchangeable as no framework (for runtime typechecking) gets everything right yet. The fact that the typing module changes heavily from Python version to Python version makes it very challenging to keep up. E.g. Python 3.7 breaks everything again and I wasn't yet able to fix this for pytypes. Unfortunately this distracts from fixing the other issues.
If I can add another wrench to this. Remember that issue with resolving the types that have strings in them? #265 This would be necessary for any kind of automatic type checking.
pytypes can resolve these strings/forward references. The case that such strings occur deeper within a type was supported only a while ago and no release was filed since then. See https://github.com/Stewori/pytypes/issues/22. pytypes also provides a service function pytypes.resolve_fw_decl
that resolves forward references from a string or nested somewhere inside a type. Recursion proof.
There seems to be a new option for runtime type checks y'all: https://attrs-strict.readthedocs.io/en/latest/
Or maybe let it run in setter: https://github.com/pwwang/attr_property ?
There seems to be a new option for runtime type checks y'all: https://attrs-strict.readthedocs.io/en/latest/
Would it be possible to merge this, or are there licensing (or other) concerns?
As it stands, don't see for us a reason to merge it, especially because it would mean that we'd have to maintain it too. Currently not looking for more maintenance burden. 🙃 We try to put our energy into making an ecosystem thrive, writing everything ourselves is unrealistic alas.
Thank you for your response. I absolutely understand where you're coming from here.
This has been proposed and discussed in https://github.com/python-attrs/attrs/issues/215, as a possible use case for the newly added
type
argument toattr.ib()
#239quoting @hynek https://github.com/python-attrs/attrs/issues/215#issuecomment-347529479