Python 3.8+ features in typeshed stubs

python / typed_ast

Modified fork of CPython's ast module that parses `# type:` comments

Other

227 stars 54 forks source link

Python 3.8+ features in typeshed stubs #118

Closed JukkaL closed 4 years ago

JukkaL commented 5 years ago

If typeshed would start using Python 3.8 only features, such as positional-only arguments, tools such as mypy that use typed-ast when running under 3.7 or earlier would sometimes be unable to parse those stubs, since typed-ast won't get any post-3.7 features according to current policies. This seems to rule out using 3.8 features in typeshed. Similarly, in case Python 3.9 introduces a new syntax for optional types, for example, it looks like we wouldn't be able to use that syntax in typeshed (at least for several years).

I see a few options that could help with the issue:

Work towards establishing a typeshed policy that stubs should only use syntax that is supported by the oldest maintained Python 3+ release (currently that would be 3.5). We actually already use 3.6 syntax in typeshed, but we could make it an exception.
Revert to the original policy of backporting all new Python syntax to typed-ast.

Option 1 could become a problem in a few scenarios that don't seem totally implausible:

An existing keyword stops being a keyword, so older Python parsers won't be able to represent identifiers with that name. This could mean that we can't represent some stdlib module stubs precisely.
Some new type related syntax gets introduced for which we can't design a good fallback syntax for older Python releases.

@gvanrossum @ilevkivskyi @msullivan What do you think about this?

srittau commented 5 years ago

I'd strongly prefer option 2 or any other option that allows typeshed and stubs in general to use the latest syntax. Mainly for two reasons:

New syntax makes stubs more readable. This is especially important in the case of typestub that is widely used.
Since typeshed is so central, people will use it as example when writing their own stubs. Therefore it should preferably use best practices.

In the (currently somewhat stalled but still planned) typestub PEP we recommend to use the latest syntax features in stubs, even if the implementation supports older Python versions.

gvanrossum commented 5 years ago

That’s going to be a problem unless we get some more volunteers to add the new features to typed_ast (or do a whole new backport from Python 3.8).

ilevkivskyi commented 5 years ago

On one hand I like the second option more, but on the other hand I don't volunteer to do the backports :-P (so maybe counting my opinion is not really fair).

msullivan commented 5 years ago

I think the best approach might be to backport only the features necessary for stubs. So for 3.8, that would mean positional arguments but not the walrus operator, and not a full rebase. That is likely to be lower effort than the full rebase, I think.

gvanrossum commented 5 years ago

If it's just positional arguments, the simplest change might be to just ignore the / in argument lists. Then mypy can still check these when it's running under 3.8. (Though that also hasn't been implemented yet.)

JukkaL commented 5 years ago

Support for positional-only arguments was recently added to mypy, so supporting them in mypy is not going to be a problem as long as typed-ast supports them.

More generally, selective backporting seems feasible, at least until there are big changes in the CPython parser. Mypy would still have to refuse to run in full 3.8 mode on Python 3.7, since not all 3.8 syntax would be supported.

LouisStAmour commented 4 years ago

My suggestion:

Like TypeScript, we could have a way to version the syntax such that syntax checkers can specify which version of “typed Python” they support from the same “latest” stub package. (In a future where typeshed publishes packages.) We could automatically translate newer syntax to a compatible older syntax until everyone has caught up, for example. To that end though, I think typeshed's standard library with support for multiple platforms and CPython target versions should be based on typeshed's but should ship with each type checker as the type checkers themselves might support different implementations of Python and it gives something out-of-the-box for end users to use, but that third-party packaged typeshed modules should “target” different versions of “typed Python” and thus remain compatible with multiple versions of the “typed Python” language.

I’m not sure if my use of “typed Python” language is precise enough, but I’m arguing that Python is to JS what “typed Python” is to TypeScript and thus different versions of the TypeScript compiler support different language features. Taken to a logical conclusion, when publishing if the goal is to support older versions of “typed Python” alongside new ones, we need a similar versioning strategy for “stubs” such that folks can opt in to newer “typed Python” features as they become available in type checkers.

Further context for my thoughts on the above at: https://gitter.im/python/typing?at=5eafad1a22f9c45c2a6a6e97 and https://gitter.im/python/typing?at=5eafd0c9a9de3d01b1e85685

Examples from TypeScript:

https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/node/v12/ts3.7/base.d.ts (note the file paths reference both which Node version they're targeting and which TS language version.) The equivalent to Node for Python would be which version of Python standard library they're supporting, while the equivalent to TS language version would be an increment of supported functionality for "typed Python" by MyPy or other third-party type checkers. If these diverge too significantly, it could be the name of the type checker and some other checker-specific versioning. Alternatively, we could find a different prefix to use to clarify that python/3.8/tpy3.7 refers to the legacy MyPy approaches, while python/3.8/tpy3.8 refers to every PEP included in 3.8, etc.

End users would not be exposed to the complexity of knowing a "typed Python" version to load because in such a scenario, the correct version would ship with with the type checker. They would only have to specify then which Python standard library they would prefer to use.

LouisStAmour commented 4 years ago

Documentation on the above approach from TypeScript is officially at https://www.typescriptlang.org/docs/handbook/release-notes/typescript-3-1.html#version-selection-with-typesversions and you can see an example https://github.com/DefinitelyTyped/DefinitelyTyped/issues/44117

This appears common for TypeScript -- rather than support macros, guards or conditionals, TS relies on Node's package.json to determine which files to load, however the primary use of package.json to determine whether a package is loading in "node" or in "browser" environment isn't supported by TypeScript's module resolution https://github.com/microsoft/TypeScript/issues/7753 so "universal" packages aren't supported via this syntax, and instead the code must do detection at runtime to determine whether to load browser or node-specific code, and for efficiency, this gets removed later at compile time by replacing runtime checks with known constants for each target, and performing tree shaking to remove branches that always evaluate to false from the build output, though this last compile time step is a build optimization.

So part of the problem is we don't have a consistent version number to increment for "typed Python" features we'd like to support regardless of what Python syntax you're targeting. It's possible someone would prefer to annotate using newer language features or proprietary extensions and we should have something capable of "graceful degradation" or version selection such that you could use a modern syntax to annotate, well, Python 2.x for example, such as by adopting new features/guards while also providing older syntax compatible files too.

If I'm understanding how Python packaging generally works, packages that want to support multiple versions of Python either need to restrict themselves to only commonly available features (the lowest supported language version including in dependencies) or they would need to come up with their own method of packaging/downgrading code such that they could release multiple packages each of which target a different version of Python as necessary.

In this regard, it's very similar to Node and Ruby where there aren't macros to support different language versions, instead you could hypothetically statically scan code for language and SDK features by version and then build a "supported versions" list from common features, or at runtime you could attempt to import different versions of modules (but this is rarely well-supported) or as part of a build script, specify alternative module versions for specific builds (also rarely done).

So the common case, targeting the lowest version and maybe providing syntax for alternative versions at build-time, is thus straightfoward when you can compile "typed JS" to regular JS, but it's a bit trickier for "typed Python" because ideally we'd want to be able to convert from "typed Python" of some newer language variety to "runtime Python" of an older variety so as to adopt newer Typed Python features while offering maximum downstream compatibility for library users. The alternative is to restrict yourself to only typings within pyi interfaces, and that sounds like a hassle to maintain, because you can't easily have one single source of truth.

gvanrossum commented 4 years ago

This is occasionally proposed (hi, @viridia :-). But Python does not use transpilation technology and many users who are currently using type annotations would have to rethink how they actually run their code if it had to be transpiled to the syntax their interpreter version actually supports.

IOW, I think there's a fatal flaw in the argument that "typed Python" is to "Python" what TypeScript is to JS.

LouisStAmour commented 4 years ago

Thanks for your reply! As I said, I am new to Python, so my apologies if this has been considered before, or if I'm not quite using the right terminology here. I'd be happy to work out more of the details or prototype something.

But let me try to clarify: JS transpilation is itself only beneficial in limited uses, and eventually, by design, is an overhead to be eliminated -- but it's also a moving target by design because it allows for adopting newer language features sooner:

"Python does not use transpilation technology" -- technically, neither does JavaScript. And while TypeScript was defined as a superset of JS, it doesn't have to use transpilation either: https://github.com/denoland/deno for example.

That said, transpiling, or in other words, compiling for lower targets by providing "polyfills" or alternatives is something that's required on the web because of how many possible runtimes the browser enables. I agree if the goal is remaining up-to-date, then we should all target the latest version, or maybe n-2 version of python, but for library authors who want to adopt newer features but remain compatible potentially all the way down to Python 2, there's an argument that if a feature can be polyfilled or ignored, you could still do so.

I'm not suggesting that this should be a default. Like typing, I think this should be adopted by advanced users, perhaps the way "six" or "python-future" were adopted, where "if you know you need it, use it" otherwise simply ignore it. To that end, I'm not even sure I'm suggesting that "typed python" requires its own filename extension, as theoretically, it's still just python but at a different version.

The comparison I might then make is JSX, which is an optional syntax of React that by convention uses .jsx extension instead of .js though either would work based on your project config. To that end, I'm perhaps suggesting a .py38 convention for files in a Python 3.8 syntax that might in turn compile to .py, or alternatively it's something that could run as part of a build-and-package process with an output folder. Thus you could say or specify that the src code is typed Python 3.8 while the destination package, optionally, could be Python 3.6 with type annotations or .pyi files.

If this sounds confusing, the same confusion technically exists in JavaScript, but we've gotten used to it by now. When writing JS, you're allowed to use any syntax available in your runtime environment, so you have to specify "the latest" runtime environment. When you want to target older browsers or produce a module others can use, you will likely produce one or more such outputs, and at times, the API for picking between them is a simple filename. You could do the same in Python, for example https://nvbn.github.io/2017/05/31/import-different/

Again, this would be completely optional. Those who only need to support one runtime environment can continue to do so, even with types. Typeshed, on the other hand, would ideally support publishing packages with either multiple targets or the lowest common target. To that end, in DefinitelyTyped's example, they only make exceptions for early TS (2.x), TS 3.2 and TS 3.7 so you'd only need to adopt this kind of approach for when new features to specify types or interfaces are introduced, and only if they make your life easier.

I actually think that the .ts file extension made adopting TypeScript harder, because step 1 was always, "tell people to rename their files" and then step 2 was "incrementally add types!". But it has the advantage of allowing .ts and compiled .js versions of identical files to live side-by-side, so you could import the .ts source for a typescript project and the .js source for a JS project.

There are issues with this, however, primarily in that most if not all npm projects have unique build processes, and therefore importing raw source files to use always required setting up a compatible build tool chain. This is one of the other reasons why JS files, though executable, are compiled -- a compiled JS file can be imported regardless of build process, as long as it's outputs code for the module system in use (these days ECMAScript module syntax).

But that's a different kind of dependency management: an example in Python-land might be the idea that developers could themselves adopt a Python 3.8 module into their downstream Python 2 project, say, by creating a build process that imports compatible polyfills into the upstream library, runs tests, and produces a Python 2 compatible module compatible with their target environment. Then set up CI to try and run the process against every new upstream version, and abandon the process as you update to Python 3.

Very few to nobody actually does this in practice for TypeScript or JavaScript unless they're running a monorepo for their dependencies, but the approach is still sound, and is somewhat inspired by Bazel's hermetic builds. (Often folks don't realise they need to micro-manage their dependencies this way for "best results" because it's simply "good enough" for libraries to use a subset of syntax supported by older versions of the language, or to require library end users to upgrade to the latest version of the language in order to use the library.

I think maybe I've not stated my case simply enough. I hope the above helps clarify a few points, but I'd be happy to either discuss this further or maybe try and outline things using um, fewer words and more specific Python examples. It'll just take me more time to edit some of my thoughts into coherence and research and write enough Python to come up with a prototype or two. My expertise is more JS/TS, then any non-Lisp language except Python. Early on I'd picked Ruby, mostly because of Rails, and Python was just close enough for a few years that I couldn't see it as a different language. But a number of PEPs have changed my mind, modern Python syntax is very appealing, and so I figured contributing to typed Python would be a great way to learn more about the language and its common packages/modules. I just think there's a lot to be learned from the "great TypeScript experiment" that some of the JavaScript community has opted in to, and it's worth trying to take some "lessons learned" from its approach.

For specific code examples from TypeScript, have a look at https://github.com/python/typeshed/issues/2491#issuecomment-623074756 and later conversation that clarifies a few misconceptions of how TypeScript works and how it supports JavaScript in IDEs.

gvanrossum commented 4 years ago

I definitely suggest that you try to present this with actual examples from Python -- your "wall of text" approach doesn't seem to reach my brain somehow. :-(

Also I'm not sure that this is the right repo to discuss it.

LouisStAmour commented 4 years ago

Okay, I realise I went down a rabbit hole on compatibility-via-transpiling, that wasn't my original intent in this issue.

For the sake of this suggestion, the "typed Python" version refers to the supported syntax of type stubs, such as MyPi (double underscore or Python 3.6), Python 3.8 and Python 3.9, etc. Ideally these would be versioned or named in such a way that it's obvious which syntax file belongs to which parser version, and the ideal scenario is that we wouldn't have to maintain older versions once they EOL in shipping/in-use type checkers. We could use telemetry to help determine this if the privacy implications aren't too terrible.

What I said originally still stands, that types in typeshed are broken out into two parts:

Those that ship with MyPy and other type checkers for standard library or extensions/plugins, and
Those that follow specific, ideally versioned, increments of typed Python syntax supported by type checkers.

For 1: This should be automatically bundled with the type checker, I covered them earlier by saying:

https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/node/v12/ts3.7/base.d.ts (note the file paths reference both which Node version they're targeting and which TS language version.) The equivalent to Node for Python would be which version of Python standard library they're supporting, while the equivalent to TS language version would be an increment of supported functionality for "typed Python" by MyPy or other third-party type checkers. If the type checkers diverge too significantly in syntax, it could be the name of the type checker and some other checker-specific versioning. Alternatively, we could find a different prefix to use to clarify that python/3.8/tpy3.7 refers to the legacy MyPy approaches, while python/3.8/tpy3.8 refers to every PEP included in 3.8, etc.

End users would not be exposed to the complexity of knowing a "typed Python" version to load because in such a scenario, the correct version would ship with with the type checker. They would only have to specify then which Python standard library they would prefer to use.

For 2: What you'd end up with for a given python module is a stubs package, versioned identically to the upstream python package, built to include the source files for every typed Python syntax supported (3.6, 3.8, 3.9, etc.) and published automatically by typeshed. The package is then loaded by the type checker and contains metadata that type checkers could read to determine which files to load based on the most recent syntax the type checker supports. The setuptools metadata (or whatever the package format contains) would look something like https://www.typescriptlang.org/docs/handbook/release-notes/typescript-3-1.html#version-selection-with-typesversions or DefinitelyTyped/DefinitelyTyped#44117

Again, end users would not be aware that we're shipping and supporting potentially multiple syntaxes because in such a scenario, the correct version would load automatically from a universal package of type stubs.

Finally, we've the maintenance issue of supporting multiple versions of type checkers in the same typeshed repo. The more we can automate the transpiling of syntax from an unsupported version to a supported version, the easier it is to introduce the different syntax versions either as generated files potentially committed to the repo, or as packaging artifacts as part of a package build process.

A hypothetical file listing for one module, codecs, in 5 versions of python and 4 versions of spec syntax:

typeshed/stdlib/2and3/stub_py39/codecs.pyi
typeshed/stdlib/2and3/stub_py38/codecs.pyi
typeshed/stdlib/2and3/stub_py36/codecs.pyi
typeshed/stdlib/2and3/stub_mypy/codecs.pyi
typeshed/stdlib/3.6/stub_py39/codecs.pyi
typeshed/stdlib/3.6/stub_py38/codecs.pyi
typeshed/stdlib/3.6/stub_py36/codecs.pyi
typeshed/stdlib/3.6/stub_mypy/codecs.pyi
typeshed/stdlib/3.7/stub_py39/codecs.pyi
typeshed/stdlib/3.7/stub_py38/codecs.pyi
typeshed/stdlib/3.7/stub_py36/codecs.pyi
typeshed/stdlib/3.7/stub_mypy/codecs.pyi
typeshed/stdlib/3.8/stub_py39/codecs.pyi
typeshed/stdlib/3.8/stub_py38/codecs.pyi
typeshed/stdlib/3.8/stub_py36/codecs.pyi
typeshed/stdlib/3.8/stub_mypy/codecs.pyi
typeshed/stdlib/3.9/stub_py39/codecs.pyi
typeshed/stdlib/3.9/stub_py38/codecs.pyi
typeshed/stdlib/3.9/stub_py36/codecs.pyi
typeshed/stdlib/3.9/stub_mypy/codecs.pyi

To reduce effort, it would be ideal if we could re-use content from other stub spec versions via transpiling or inclusion/additional annotation syntax to dynamically modify previous specifications. Ideally there would be an incremental way to load interfaces with redefinitions, but if not, or if that's too complicated, the assumption is that the type checker will ship, and load, only the stubs valid for its format. So if it were MyPy in this hypothetical, it could ship:

typeshed/stdlib/2and3/stub_mypy/codecs.pyi
typeshed/stdlib/3.6/stub_mypy/codecs.pyi
typeshed/stdlib/3.7/stub_mypy/codecs.pyi
typeshed/stdlib/3.8/stub_mypy/codecs.pyi
typeshed/stdlib/3.9/stub_mypy/codecs.pyi

If a future version of, say Pyright, supports Python 3.8 syntax, it could instead use:

typeshed/stdlib/2and3/stub_py38/codecs.pyi
typeshed/stdlib/3.6/stub_py38/codecs.pyi
typeshed/stdlib/3.7/stub_py38/codecs.pyi
typeshed/stdlib/3.8/stub_py38/codecs.pyi
typeshed/stdlib/3.9/stub_py38/codecs.pyi

It's up to the type checker if they want to preserve the above folder paths or package the stdlib folder, and its variants, differently.

As to whether typescript is like typed Python, I'll leave that argument for another day, or another thread. :)

gvanrossum commented 4 years ago

I still think the problem this issue is trying to address is different than what's going on in the TypeScript world. Newer versions of TypeScript can provide new syntax and transpile to standard JS; stubs need to be parsed by TS only. So the matrix of stubs for a given package has package versions in one dimension and TS versions in the other dimension. TS takes care of polyfills to support different JS versions, so JS versions aren't really relevant.

For Python, the matrix of stubs has package versions and Python versions as its dimensions, and type checker versions don't play a role here. New typing syntax (for example, PEP 585, using list[int] rather than List[int]; or PEP 604, using list | tuple instead of Union[list, tuple]) is added to new Python versions, and the type checkers simply to follow suit (mypy, for example, doesn't yet support list[int], but it will surely do by the time 3.9.0 is released).

The type checkers have to support other random new syntax that gets added to new Python versions too, because it affects type checking. For example, the walrus operator added in Python 3.8 (PEP 572), or the relaxed syntax for decorators added in 3.9 (PEP 614).

Versioning of stubs is not affected by walrus or decorator syntax though, and there is virtually no typing syntax that is only used in stubs.

I believe that the evolution of typing in Python happens differently than the evolution of TypeScript. Any new syntax that's added to express typing functionality has to be backwards compatible, in the sense that it can't break any existing Python code. in addition there is often the desire to use the new functionality in older Python versions (not just older type checker versions).

Therefore, the most common form of evolution we see is the addition of a new name to some namespace. We can then have that namespace be the stdlib for new Python versions (typically in the typing module) but the same name will be added to a PyPI package to support older Python versions (usually in typing_extensions). It is easily arranged in the type checkers to support the new functionality in two different sources (this is usually done through a little cheating in stubs).

Examples include Literal (PEP 586), TypedDict (PEP 589), Final and @final (PEP 591), Annotated (PEP 593), and TypeAlias (PEP 613). Even though these are all PEPs, none of then add new syntax (just new usages of existing syntax), and none of them require versioning of typeshed stubs.

Once the type checkers are updated, they support these features for all Python versions, and the stubs can use them regardless of the Python version. And until the type checkers are updated, these features don't really work. (Another difference with the TS/JS ecosystem is that there are multiple type checkers -- mypy, pytype, Pyre, PyCharm, Pyright. That's one reason why we use the PEP process to get agreement about new typing features.)

Even new operators can be seen as just names in a namespace -- e.g. the proposed | notation for unions (PEP 604) is just overloading an existing operator on type objects (where it previously was not defined).

The only place where we do have a real problem ATM is the syntax for positional-only parameters added in Python 3.8 (PEP 570). Let me explain. (Sorry if this review is not useful for you.)

Before 3.8, we did not have a syntactic way to indicate "this parameter must be passed by position". Since we found this is a common thing that we wanted to express in stubs, we created a convention for it (which made its way into PEP 484) where we write e.g.

def point(__x, __y, __z, label=""): ...

Here, only label can be passed as a keyword argument (label="spam").

In 3.8, a better notation was introduced:

def point(x, y, z, /, label=""): ...

This form fails with a SyntaxError when parsed by Python 3.7. Now, typed_ast is currently based on Python 3.7 (by which we mean "we took the Python 3.7 parser and turned it into an extension module, then added type comment support"), and this notation is not valid in 3.7. Because we don't have the ability to magicaly upgrade typed_ast to 3.8, our current policy is that type checkers wanting to support syntactic features that are new in Python 3.8 or later must not use typed_ast, but instead must use the stdlib ast module of a Python version that is at least as new as the Python version where the feature was introduced. (The stdlib ast version in 3.8 was updated to support the features needed by static type checkers, in a way that's fully compatible with typed_ast. In particular 3.8 added support for type comments in the AST, and the long-term plan has been to gradually deprecate typed_ast.)

This works fine for inline typing, i.e. type annotations embedded in the source code of a library. If a library chooses to use Python 3.8 syntax (e.g. this / slash notation or the := walrus operator) then that library can be type-checked in two ways: if the type checker has its own fully-custom Python parser (like PyCharm) then the type checker just has to support the new syntax; otherwise, if the type checker's parser is a wrapper around typed_ast/ast (like mypy), then the type checker itself must be run with at least Python 3.8, so that it can use the stdlib ast module. Type checkers that only use typed_ast will have to be upgraded (to use ast when run under Python 3.8 or newer) before they can be used to check such code.

(An important detail is that for syntax supported by typed_ast, and type checkers based on typed_ast, there is no need to run the type checker using the same Python version as the target being checked -- typed_ast can be imported in Python 3.5 and it can still parse Python 3.7 syntax. But for 3.8 features this is not possible, because we didn't have the manpower to backport the Python 3.8 grammar to typed_ast. Arguably, with the demise of Python 2, the importance of such "cross checking" will become less over time. Also, starting with 3.8, the stdlibast can cross-check older versions. Just not newer versions.)

For the typeshed stubs, we have so far used a somewhat uncomfortable compromise. We pick a Python version (most recently this has been 3.7, because that's what typed_ast supports) and declare that stubs can use Python syntax features up to Python 3.7 (insofar they are relevant to type annotations). Other type checkers have followed suit, regardless of whether they're based on typed_ast or not.

Because this limitation is only about syntax, it would be no problem to start using e.g. list[int] instead of List[int] in stubs, once the majority of type checkers have added support for this (such support doesn't affect the parser, but it affects other parts of the checkers). Ditto for using a|b instead of Union[a, b] (although that PEP probably won't land in 3.9, it's likely to land in 3.10).

There is no need to have multiple versions of stubs for this purpose. The process is simply: (a) agree on the new notation (using the PEP process), (b) support new notation in (most) type checkers, (c) start using new notation in stubs. The evolution of typeshed to become more like DefinitelyTyped, with separate packages for most 3rd party library stubs, does not change this -- different Python versions are supported using sys.version_info checks in the stubs, and different package versions can be supported using different files (so the filesystem only needs to support one dimension of versioning).

However, the attractiveness of the / notation for positional-only arguments shows the weakness in our approach to base the stub syntax on the limitations of typed_ast. We don't want to require Python 3.8 to run mypy to check a Python 3.7 program, and this means that typeshed stubs can't use Python 3.8 syntax, no matter how attractive it is, and we're stuck with (__x, __y, __z).

It seems that @JukkaL's proposed approach is to selectively backport features to typed_ast. In particular, we wouldn't have to backport the walrus (it's irrelevant for stubs) but we would have to backport / for positional arguments. Your (@LouisStAmour's) proposal is to instead automatically produce several versions of stubs; IIUC in this particular case it would mean turning e.g. (x, y, z, /) into (__x, __y, __z).

I really think that the choice is largely determined by culture -- in the Python culture Jukka's approach (upgrading the type checkers' parsers) makes the most sense, while in the TS culture your approach (based on limited transpilation) is the natural choice.

Perhaps one reason is that in the Python culture, upgrading e.g. to the latest version of mypy is a relatively straightforward process for most projects. I and the other mypy authors have done this innumerable times at Dropbox, and while there were occasional bumps in the road, it was orders of magnitude simpler than upgrading to a new Python version. If the type checker is broken, CI will limp a bit, while if the Python version is broken, production is down. These are entirely different problems that affect entirely different groups of engineers in the company in different ways (and of course a production outage draws the attention of higher-ups much sooner :-).

I imagine that for you, coming from the TS culture, implementing that particular transpilation looks simple, while for us it would be a totally new thing, plus we'd have to design the metadata schema and filesystem layout to support the "type checker version" dimension of stub versioning that we currently don't need to handle at all.

But maybe some of the others on this issue (e.g. @srittau or @JukkaL) have something to say?

srittau commented 4 years ago

I don't have much to add, but I agree with everything Guido has written. Ideally we'd have a typed-ast that is independent from Python and always supports the latest syntax, but seeing how tightly ast's build process is tied to Python's build process this is not easily possible. typescript is hard to compare to Python + type checker for the reasons Guido mentioned. typescript is both a transpiler and a type checker, while in Python the interpreter is independent from the type checker and they can be updated separately.

In my experience, updating mypy is not a big deal, and that despite the fact that updating mypy currently means updating both the type checkers and the stubs. Once the stubs are distributed separately, this will become even less of a problem.

LouisStAmour commented 4 years ago

@gvanrossum @srittau The example I gave from TypeScript is relevant to Python — TypeScript’s DefinitelyTyped repo has to support Node language bindings which change independently of TypeScript versions: https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/node/v12/ts3.7/base.d.ts

They also ship bindings for JavaScript which change with ECMAScript versions annually, and choose to do so for stage 3 and up proposals (as drafts, they start at “stage 0”, by “stage 4” they’re final and roll in to the year’s ECMAScript release). Users can independently pick which environment bindings to include (browser DOM, Node version (via package manager), ECMAScript version to write in, etc.) and which level of ECMAScript to output.

The browser DOM and ECMAScript libraries ship with the compiler while the Node package updates separately. I figured we’d get the most utility replacing JS library versions with Python library bindings shipped with type checkers, but that we could model it more directly off the node package bindings.

Edit: We seem to have different ideas on how quickly the type system syntax for Python could evolve, my point I suppose is that monthly-ish improvements to the type system over 5+ years is what has gotten TS to the success it is today, and that we already have a system of innovation where type checkers can include proprietary extensions which ship and if useful could spread. My goal in mimicking the version selection inside DefinitelyTyped and inside package distribution is to adopt new type checker syntax as quickly as possible. Transpiling is what allows syntax adoption to take place without significantly increasing the maintenance burden for typeshed pull request authors. When new versions are widely available and telemetry shows old versions of type checkers aren’t in use, delete the older language versions. Absolutely, within this repo, update always to the latest syntax, but for backwards compatibility with other previous type checkers that haven’t yet, we need to support “branches” if you will, or simply transpiled folders, of other code. The same applies if we want to more specifically support a new feature provided by, say, a plugin that ships by default, or an extensible syntax supported at first in only one type checker.

That said, I’m only addressing the first two paragraphs, let me dig in and read up. Thanks for taking the time!

gvanrossum commented 4 years ago

What are “language bindings” in this context? The file you link is just some metadata.

LouisStAmour commented 4 years ago

If you trace the references, you’ll get to https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/node/v12/base.d.ts and eventually something like https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/node/v12/globals.d.ts or https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/types/node/v12/path.d.ts

The biggest changes between typescript syntax appear to apply to the language (version of ECMAScript) supported by a particular Node.js version, so the TS-syntax/version specific references are mostly metadata in this particular case. The bindings are as shared as possible, but as you can see browsing DefinitelyTyped (which I agree is a large and intimidating collection of files and folders), the end result is any package, not just node bindings, can vary by Typescript syntax by version of the TS compiler. I gave the example not to share file contents but to share the versioning in the folder path, which is somewhat by convention.

Edit: My apologies, another difference is assert guard syntax added in TS 3.7: https://github.com/DefinitelyTyped/DefinitelyTyped/commit/e9103792668257b6cc86cd21d6c455ecb5fae1ea

LouisStAmour commented 4 years ago

Therefore, the most common form of evolution we see is the addition of a new name to some namespace. We can then have that namespace be the stdlib for new Python versions (typically in the typing module) but the same name will be added to a PyPI package to support older Python versions (usually in typing_extensions). It is easily arranged in the type checkers to support the new functionality in two different sources (this is usually done through a little cheating in stubs).

A clarification: when I refer to “typed Python syntax” changes, I don’t necessarily mean syntax as in the grammar of a language. I also mean any modules that cannot be found, any extension methods that don’t exist, any plugin references that aren’t available, and so on. “Syntax incompatibility” or “syntax version” is a shorthand for me to use when describing how different type checkers, with different plugins, could support different ways of expressing types that are potentially incompatible with each other, or with future versions of the Python spec in the same way that a newer Python script is not guaranteed to run unmodified on an older version of a Python interpreter.

I recognize that when designing syntax/language changes, it’s possible to make them as minimally intrusive as possible for the interpreter/checker to work with both new and older syntax, and to that end, TypeScript itself has been very stable since versions 2.8-3.2 or so. Only minor enhancements have been added over time since. They also switched from implementing stage 2 ECMAScript enhancements by default (which were less stable) to implementing stage 3 ECMAScript language changes which are more stable specifications, because they want to avoid breaking TypeScript to keep up with ECMAScript language evolution.

If they could, I think they would prefer the Python situation, where TypeScript’s type syntax was itself an extension of pure JavaScript in the ECMAScript specifications, and an optional, ignorable part of the syntax for Browser/Node runtimes. That said, I think the TypeScript team likes the flexibility of adding new typing features in the same way that Python supports type checker plugins and extensions.

I believe that the evolution of typing in Python happens differently than the evolution of TypeScript. Any new syntax that's added to express typing functionality has to be backwards compatible, in the sense that it can't break any existing Python code. in addition there is often the desire to use the new functionality in older Python versions (not just older type checker versions).

Let me express the TypeScript variant of this - replaced sections in bold: “Any new syntax that's added to express typing functionality has to be backwards compatible, in the sense that it can't break any existing JavaScript or TypeScript or Node.js code. In addition there is often the desire to use the new functionality in older JavaScript/Node.js versions (not just older TypeScript compiler versions).”

I’d continue by adding that if there is a conflict between TypeScript syntax and JS syntax (which Node implements and extends), the preference is to break TypeScript to maintain compatibility with JS and Node, with a runtime setting to preserve previous behaviour. If there is a conflict between Node and JS standards, the preference is the same, to prefer the JS standard first, and implement a Node flag to allow backwards compatible behaviour for some amount of time. So both Node and TypeScript implement JS standards first, which continuously evolve.

If there’s an issue, it’s that TypeScript is versioned at the compiler level and the specification for the language is generally “whatever the compiler currently does” in a very C/C++ sort of way. By comparison, as I said elsewhere, Python’s approach is indeed a fair bit cleaner, while type checkers can implement their own extensions the primary evolution to add type checking occurred within Python, the equivalent would be if a future version of JavaScript natively offered TypeScript types, then the language would need to be fully specified for compatibility with multiple implementors and TypeScript folks would be building extensions to the specification as new versions of the language, just as Python type checkers currently do. The TypeScript approach does not allow syntax extensions or plugins, but these can be added using pre-processors, the Python approach by comparison does indeed want to allow for multiple implementations and customization of type checkers. In this regard, the Python approach is similar to both TypeScript and Babel.js, the latter is an extensible way of using newer language syntax with older runtimes.

if the type checker's parser is a wrapper around typed_ast/ast (like mypy), then the type checker itself must be run with at least Python 3.8, so that it can use the stdlib ast module. Type checkers that only use typed_ast will have to be upgraded (to use ast when run under Python 3.8 or newer) before they can be used to check such code.

That is a very good point. Just as type checkers are written in Python and use standard library features, TypeScript itself executes and ships as a Node (JavaScript) module written in TypeScript. https://www.npmjs.com/package/typescript distributes the TypeScript checker/compiler, and requires no runtime dependencies to be installed beyond Node runtime version 4.2.0 or greater, as specified in their packaging metadata: https://github.com/microsoft/TypeScript/blob/master/package.json 

They are thus successful in running under Node version 4.2.0 or later (For context, Node 4.2.0 was an LTS release from October 2015 and we’re now at v14.2) because they’ve transpiled newer JS language features back to very old syntax supported by Node 4.2.0, and they do not use any of Node’s standard library features (or if they do, they use some kind of Node “polyfill” library to provide compatible replacements for newer syntax back to older Node versions). The equivalent would be to support both typed_ast and ast modules, and to have typed_ast itself become a polyfill such that it tracks and implements all of ast module’s newest functionality so that if ast module isn’t available, a variant from typed_ast could be used instead. This is the secret to success behind Node.js and JS compatibility across languages — a lot, a lot of “2to3” going on, a lot of polyfills and indirection/wrappers/reimplementations of native functionality.

There is no need to have multiple versions of stubs for this purpose. The process is simply: (a) agree on the new notation (using the PEP process), (b) support new notation in (most) type checkers, (c) start using new notation in stubs. The evolution of typeshed to become more like DefinitelyTyped, with separate packages for most 3rd party library stubs, does not change this -- different Python versions are supported using sys.version_info checks in the stubs, and different package versions can be supported using different files (so the filesystem only needs to support one dimension of versioning).  

There is a need to have multiple versions of stubs — especially in third-party packages — when you want to support backwards compatibility to previous type checkers, or to not force every end user to update their IDE and/or CI workflow to support newer type checkers. This is why TypeScript added support around version 3.1 as I noted earlier. They recognized that everyone wanted to adopt newer TypeScript syntax but in a backwards compatible, “2to3” sort of way. This way a type checker stuck using Python 3.7 syntax won’t have to update to support types written in Python 3.9 or 3.10.

Another downside to using the standard library to distribute the ast module is that you have to upgrade the Python runtime in order to access newer versions of the Python syntax. This would be the equivalent to requiring folks to upgrade Node and TypeScript compilers at the same time because some part of the JS parser they used was actually shipping with the Node runtime standard library. Package managers, by comparison, are more flexible, as are polyfills — so typed_ast could use ast if ast has the features in the same way that polyfills for JS look to see if the JS runtime has a feature and only re-implements it if it isn’t current. (This is more common with JS “standard library” provided by browsers than with Node standard library, but could occur with both.)

It seems that @JukkaL's proposed approach is to selectively backport features to typed_ast. In particular, we wouldn't have to backport the walrus (it's irrelevant for stubs) but we would have to backport / for positional arguments. Your (@LouisStAmour's) proposal is to instead automatically produce several versions of stubs; IIUC in this particular case it would mean turning e.g. (x, y, z, /) into (x, y, __z).

  I’m actually saying “do both”! Actively upgrade the ecosystem of type checkers to support newer language features, and even un-published, draft language features (but don’t enable those by default), and also support older type checkers in the stub file definition so that we never have to tell someone to upgrade their IDE plugin or CI system to use newer library definitions or syntax. And so that we can use newer syntax ourselves, of course, in typeshed while supporting older clients.

My approach may sound more complicated, but it’s worked out very well for the TypeScript folks, and the last thing you ever want to have happen is someone says, “the types caused me a problem today” or “I couldn’t fix the security bug until we upgraded our CI system and Python version” — you want to encourage evolution of typing to such a point that instead all you hear, through persistent iteration of type checkers and syntax in stubs, is “wow, the type checker really helped me refactor this part of the code, it instantly flagged areas I’d missed, and it didn’t give me any errors I had to ignore or use manual casts for or less precise syntax!”

The only way you get to that ideal is as TypeScript did, make backwards compatibility a priority while equally making adoption of new syntax quickly a priority. I didn’t realize it when I started posting in these threads, but the JS community has adopted code-generation and automatic code rewriting in a huge way, and that, plus a number of design decisions both for TypeScript monthly-ish releases and of the DefinitelyTyped project community building was what has allowed TypeScript to spread as widely as it has.

The last missing piece is VS Code’s TypeScript integration, which is absolutely first-class, but WebStorm (the JetBrains IDE) comes a tight second. (Visual Studio however is one of the worst development platforms I’ve ever used for TS, at least right now.) The IDE integration as I pointed out in the TypeShed thread is essential for adopting type checking in non-typed JavaScript. It means you don’t ever have to know what type packages to install, you just install regular packages and the IDE silently downloads types in the background to check your non-typed JS code against.

LouisStAmour commented 4 years ago

Sorry for posting so many times in a row but I have to correct my earlier assertion that we should only ship Python standard library changes with the type checker. If there’s a danger that type checkers will require newer versions of Python and folks aren’t prepared to upgrade their type checker as a result, then we should also ship the Python standard library’s types as if it were any other packaged third-party type the way the Node type checker bindings currently ship for TypeScript.

I said standard library for a reason: I’m trying to draw a distinction between the standard library and the Python syntax, the way there’s a distinction between browser/node standard APIs and the ECMAScript language itself. That said, it would be the equivalent of wanting to use Python 3.8 standard libraries while only supporting Python 3.6 language syntax. It’s possible, especially if you want to build a library compatible with both Python 3.6 and 3.8, but it’s perhaps unlikely.

From a typescript perspective, the only reason to upgrade to newer Typescript/type checker runtimes is because you want to use a newer language syntax when writing your typescript, or checking your JS. To that end, if the older type checker supports the 3.6 language, but you want to add types manually for 3.8 stdlib, you can, with typescript’s approach, do so manually, just introduce stubs for 3.8 in a 3.6-compatible syntax to your project’s stubs path. (I’m intentionally trying to use Python terminology here.)

ethanhs commented 4 years ago

I come with some promising results. I was able to pretty easily backport the new in 3.9 peg parser with minor hassle. I think this could mean we can update typed_ast again with the newer parser and use that as the basis for the parser for older Python versions.

gvanrossum commented 4 years ago

I feel this issue is unmanageable due to too many long comments about the general approach that distracted from the main functionality. Maybe we'll be able to use the PEG parser (see #138). Maybe we'll be able to selectively backport positional-only parameters. But I'll let the code speak for itself.