Explore building third party stubs as packages

srittau commented 5 years ago

This is an alternative to #2440 (disallowing third-party stubs). The idea is that typeshed remains/becomes a central repository for third-party stubs that are not bundled with the parent package, similar to DefinitelyTyped. In the future I expect type checkers will not want to bundle all third-party stubs for a variety of reasons, so third-party stubs would be distributed as separate PEP 561 stub-only packages, one per upstream package.

(I tried to integrate points raised there into this issue, especially those by @JukkaL in this comment.)

Advantages

Due to typeshed's tests, packages in typeshed will continue to work with the latests versions of mypy and pytype.
Basic level of consistency and standards due to review by typeshed maintainers.
Consistent naming scheme for third-party stubs packages, allowing users to just "pip install <guessed name>" and it will work when there are stubs.
Tooling (like tests) is easier to manage as it can remain part of the typeshed package.
Easier to contribute to stubs, since contributors don't need to learn the intricacies of multiple stubs projects.
No need to start a separate project just to distribute stubs for a new package.

Issues

Workload issues for typeshed maintainers.
Typeshed maintainers will often not be familiar with the package for which pull requests are opened.
Publishing stubs takes longer due to the necessary reviews.

Further Considerations

What should the generated packages be called? @ethanhs's PEP 561 actually requires stubs-only package to be named <package>-stubs. typeshed could squat these names and release them (and remove the stubs) on the request of upstream maintainers. Alternatively, typeshed could add a common prefix or suffix (ts, typeshed) or in addition to or instead of the -stubs suffix. This would be in violation of PEP 561, so we'd need to get broader consensus to amend the PEP. My personal favorite would be <package>-ts.

To guarantee a fairly quick turnaround on stubs, to minimize work for publishing stubs, and to prevent all third-party stub packages to be updated whenever a new typeshed version is released, stubs for a specific third-party package should be published automatically when it changes.

Possible Implementation

Add a generic setup-tp.py to typeshed that takes its package name from the directory it's in and uses the current date and time as version number.
Amend the CI process for master only so that after a successful test run, for every third-party package that was changed since the last successful run, the following is done automatically:
1. Copy setup-tp.py into the third-party module directory as setup.py.
2. Build the package in that directory.
3. Upload to pypi.

JukkaL commented 4 years ago

@LouisStAmour Learning from DefinitelyTyped is a good idea, and the current proposal already heavily borrows from the way the TS community does things. Thanks for the links!

I'd like to keep the scope a bit limited initially, however. I expect that we'll continue improving our approach, but we need start somewhere first. At this point I'd like to figure what the most important things we should agree on right from the beginning (things that will be difficult to change later on).

Here are replies to some of your points (I'll continue later).

Basically, the rule of thumb is that however you install a package, you can install an identical version of the package with some prefix to get that version’s types, ideally as compatible with your runtime as the upstream version.

Yes, that's basically what our current proposal tries to solve? The main gaps are that many of the existing stubs don't perfectly match any particular package version, and that for many packages there won't be enough contributor interest to provide stubs for every package versions, so the stub version might be a little off. Most packages have sufficiently stable APIs that hopefully these won't be major problems. Besides, the current situation is significantly more limited.

We should ideally automate checking for types in upstream published packages and then remove publishing ours when upstream wants to take over, e.g. https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/README.md#removing-a-package (like Python package type stubs, you can publish typescript definitions in Node packages also).

This may become an issue once the number packages is higher. This doesn't seem urgent.

Note that compatibility between TypeScript Compiler versions is an issue when publishing exclusively using upstream version numbers. https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/README.md#i-want-to-use-features-from-typescript-29-or-above highlights how eventually we’ll need a standard to not just support multiple versions of Python but also support multiple versions of Type Checkers.

Yeah, this may become an issue in the future.

The only thing I would adjust is the use of semver by definitively typed folks. I would use the upstream version number including patch, and I would increment a number attached afterward. Such as 1.2.0.0 or in more of a Linux fashion, 1.2.0-0. https://github.com/DefinitelyTyped/DefinitelyTyped/blob/master/README.md#how-do-definitely-typed-package-versions-relate-to-versions-of-the-corresponding-library

I was proposing to use x.y.z where x.y is the upstream version and z is the stub version, but having the stub version as the fourth component is also reasonable. Here we are restricted a bit by what PyPI supports.

They have an automated system to maintain a GitHub CODEOWNERS file broken down by project: DefinitelyTyped/DefinitelyTyped#44417

Something like this would certainly be nice, but it doesn't seem required initially.

A bit handles tagging and facilitates the review process DefinitelyTyped/DefinitelyTyped#44444

Improving our review process will need to wait until later. I suspect that it may be similar in effort to implementing the current proposal (or larger).

LouisStAmour commented 4 years ago

Relevant to the above on what syntax to support for type checkers, I’ve left a comment or two at https://github.com/python/typed_ast/issues/118#issuecomment-623614230

The reason I focus on CODEOWNERS and maintenance is that the existing approach requires too much owner-approval and doesn’t have enough community involvement.

Any real world project is going to end up maintaining some number of type stubs in these early days until we can publish shared ones for a sufficient number of packages.

What this means is that the audience of maintainers for typeshed should be the audience of early adopters, users rather than owners, and the project will be more successful if this is the case.

TypeScript was not the first to implement types for JavaScript (Closure, Dart, JSDoc, etc.) not was it the first alternative syntax (CoffeeScript), but it gained immediate relevance with the start of the DefinitelyTyped project because users could start incrementally, yes, but incrementally with installing a list of typed packages and adding a config file for the compiler to do type checking.

It gained momentum in a second phase as new common libraries adopted TypeScript while supporting standard Node or browser environments for those who don’t use TypeScript, in the same package. Obviously exporting type information is one approach, but, I would argue, requiring newer Python runtime language features just to implement typed Python is a dealbreaker for most libraries and will lead to .pyi maintenance headaches as the only way to implement types using the latest and greatest syntax such that those files are ignored or kept separate from the upstream projects.

I guess what I’m getting at is despite a number of flaws, TypeScript seamlessly integrated within the existing JS ecosystem, it just took a number of years to shave off some of the rough edges. And there are still rough edges but they are mostly because of limitations in npm and compiler systems, and a lack of extensibility in the TypeScript compiler.

Python here has other opportunities and flaws but I’m beginning to think that thinking of “typed Python” as a distinct language versioned separately from “runtime Python” is more appropriate than it sounds, even if “typed Python” is always a superset of Python by virtue of it being a future version of earlier Python runtime syntax targets.

I’m beginning to think there’s room for a “typeshed / typed Python” PEP of sorts but I’m new enough to Python and MyPy that I don’t have specific implementation recommendations at this point outside of what I’ve put into these two GitHub issues. And again I’d like to contribute however I can, I recognize none of this comes for free, neither the decision making nor the “agile” implementation and prototyping.

jakebailey commented 4 years ago

Python here has other opportunities and flaws but I’m beginning to think that thinking of “typed Python” as a distinct language versioned separately from “runtime Python” is more appropriate than it sounds, even if “typed Python” is always a superset of Python by virtue of it being a future version of earlier Python runtime syntax targets.

This is the sort of thing that I think greatly distinguishes the dynamic between TypeScript and JavaScript from "typed Python" and Python. The former exists because JS didn't have the syntax or features to enable type checking; entirely new languages were created which transpiled back to JS.

But Python is in a different place, having both a type syntax which can exist in "real" Python code and the "features" people expect (e.g. classes or lambdas, which we take for granted but weren't standardized in JS until EMCAScript 2015, after TS was introduced and had them). I view "typed Python" as just regular Python, not a whole different language, especially post-Protocol (pending a rework of a lot of typeshed to make use of them). If someone's worried about the perf difference of having annotations, the annotations can be removed safely, and it's still fundamentally the same thing (and in future Python releases, that overhead will likely disappear). In TS, this isn't true, at least not to the extent of it being a simple "remove these parts from the AST" transformation to produce valid JS.

From the editor point of view (my area), the types provide an overall richer experience to users, even if they never have to look at stubs/annotated code themselves or don't care about type checking in general. So we'd of course like to be able to use them. But where the difference between TS and Python matters is how developers obtain the stubs/types and interact with them.

In TypeScript, you install type packages explicitly. There is no expectation that anything is going to work in the editor when you haven't installed the @types/... package for libraries (or if they themselves are not already TypeScript or ship d.ts files directly). This works out for TypeScript, as anyone using TypeScript is explicitly opting in to writing TypeScript; they have to set up the dev environment to run tsc and so on. It's "of course I need to install the d.ts files if I want to have anything work, this is TypeScript!"

But as I said, my view is that "typed Python" is just Python. Editors today can make use of annotations in existing code, or pyi files next to py files, or bundle bits of typeshed/stub packages, etc, and give all users a good experience. If the route being taken ends up being too close to that of TypeScript where developers can only get the great experience (type info for libraries) by explicitly installing them into their environment, I think that may end up being uncomfortable.

I'm not entirely certain that the above matters too greatly in the scope of "how to distribute third party stubs from typeshed", in any case. I am liking the shape of the proposal, especially the removal of the 2/2to3/3 distinction (though I wish PyPI namespaces were ready, rather than a convention...). But I still wanted to mention the parallel issue of editors who want to consume these types in a way that's useful for people who aren't explicitly running a type checker (and really just want nicer completions), and how that situation differs for Python as compared to TypeScript.

Now, one thing I think is missing from these proposals is metadata about which modules a given distribution provides. If an editor wants to do a quick lookup like "I need a stub for foo.bar.baz", it's not always obvious how to discover that info without actually downloading all of the stubs and traversing their trees.

ethanhs commented 4 years ago

@LouisStAmour @jakebailey thank you for your thoughts on these matters! I think "typed Python vs Python with typing" is a bit out of scope of this proposal, as that is a fundamental change to how we approach typing in Python. I think discussion about that is probably best targeted to typing-sig@python.org.

Now, one thing I think is missing from these proposals is metadata about which modules a given distribution provides. If an editor wants to do a quick lookup like "I need a stub for foo.bar.baz", it's not always obvious how to discover that info without actually downloading all of the stubs and traversing their trees.

Fair point! For mypy, we thought about something like this, but we decided the general case was rather manual. There is no way at the moment to learn the totality of packages that support PEP 561 typing without downloading PyPi. I somewhat regret that I didn't add distribution metadata about typing support in a package, but in the end that seemed difficult to achieve as well. Therefore, we decided to put the onus on the user to install types for the packages they used, if the types are shipped separately. Then it is "just" looking at site-packages.

A metadata file could work. It would probably have to be partially manually edited to take into account packages that ship stubs on their own (numpy-stubs, which has been seeing a lot of work recently, being a notable example). I think type checkers would also like this as we could give better diagnostics. However, I do think that this doesn't need to make it into the MVP. I honestly think we should get something out the door sooner rather than later, and then expand on that. The current proposal is very good and we seem to have general consensus (If I am mistaken and people have concerns, please do point them out!). Once it is implemented we can extend it to add things like the metadata editors need.

LouisStAmour commented 4 years ago

I think "typed Python vs Python with typing" is a bit out of scope of this proposal, as that is a fundamental change to how we approach typing in Python.

True, the context in which it came up is more relevant to this other ongoing discussion of adopting Python 3.8 syntax in Typeshed: https://github.com/python/typed_ast/issues/118

In TS, this isn't true, at least not to the extent of it being a simple "remove these parts from the AST" transformation to produce valid JS.

That is absolutely the case, however! https://github.com/babel/babel/blob/master/packages/babel-plugin-transform-typescript/src/index.js is all the evidence I have, but if you search it for remove() you'll see it's mostly modifying the AST to identify and remove unnecessary imports or unnecessary paths/nodes in the AST tree. That's why I said TS is a strict superset of JS. All TS is valid stage3 and above ECMAScript once you remove the types/namespaces (and ignore the legacy stage2 decorators).

In TypeScript, you install type packages explicitly. There is no expectation that anything is going to work in the editor when you haven't installed the @types/... package for libraries (or if they themselves are not already TypeScript or ship d.ts files directly). This works out for TypeScript, as anyone using TypeScript is explicitly opting in to writing TypeScript; they have to set up the dev environment to run tsc and so on. It's "of course I need to install the d.ts files if I want to have anything work, this is TypeScript!"

That's not how TS modules work. The modules are shipped and installed via Node/npm, which is exclusively a JS runtime (not TS) and so they work with JS as much as they work with TS. Any given JS file is valid TS. Any given Python file is valid typed Python.

Type checking JS is a supported use case by both TSC and VS Code. https://github.com/microsoft/TypeScript/wiki/Type-Checking-JavaScript-Files Type-checking

Hinting of which types to install is done by the IDE via a Microsoft lookup service that DefinitelyTyped contributes to. https://microsoft.github.io/TypeSearch/ and https://code.visualstudio.com/docs/nodejs/working-with-javascript#_typings-and-automatic-type-acquisition and https://github.com/microsoft/TypeScript/blob/e6390efb013009a94e8cf3383f2981765b7cfd37/src/jsTyping/jsTyping.ts#L95-L104

where developers can only get the great experience (type info for libraries) by explicitly installing them into their environment

In TS, you can ad-hoc write type definitions (.d.ts) for any file and import them as you like. But you can also publish a .d.ts version of a .js file and TSC will resolve the types automatically just by putting them in the same folder as part of module resolution. For module resolution, TypeScript's approach is two-fold, the first is that you can use triple slash directives (which is an import/reference syntax that is effectively the same as copying and pasting a referenced file into that location, predates node package manager integration, I think) and Node package manager module reference rules: https://www.typescriptlang.org/docs/handbook/module-resolution.html The @types/ namespace is included, additional type definition locations can be added in tsc config however, or you can simply import a .d.ts file/package directly in your sources (and when you want to reference a type declaration from a stub, that's often the case anyway).

I'm not sure how I can be more clear about this, I strongly think we should evaluate and understand TypeScript and DefinitelyTyped as an ecosystem before we continue reinventing the wheel. There will be areas where Microsoft gets it wrong -- especially because legacy is always hard to change -- and npm's package management is not ideal by any stretch of the imagination -- but there are also many similarities to what we'd like to accomplish that it's worth evaluating their efforts to see what we can learn.

See also https://sorbet.org/ which has an LSP integrated and will be shipped as part of Ruby 3 (in some fashion...) and https://github.com/sorbet/sorbet-typed their version of a type repository with built-in lookup system. The lookup system is integrated using a command line tool which internally uses git: https://github.com/sorbet/sorbet/blob/047d672e2a49f6a331541d65b7a3466fd324d2eb/gems/sorbet/lib/fetch-rbis.rb and therefore does not publish to RubyGems the way DefinitelyTyped publishes to NPM. The advantages of publishing would be that you could re-use stub module lookup code to lookup packaged modules, and that you don't have to directly publish whatever's in your repo, so you can have build steps, supported versions of python, dependency graphs even outside your repo, etc.

Aside on Sorbet: It doesn't support literal types and the community doesn't feel quite as welcome to new syntax suggestions as in TypeScript. https://github.com/sorbet/sorbet/issues/2504 Some implementation details: https://github.com/sorbet/sorbet/blob/master/gems/sorbet/README.md

Sorbet uses native Ruby code to annotate Ruby in-line, which seems a small bit repetitive. Maybe they'll fix that by Ruby 3. It's arguable that they would have an easier transition from un-typed Ruby to typed Sorbet if they gave examples in both TS and Sorbet. Adding types to a dynamic system requires a much more expressive type system than traditional C#/Java due to literals and dynamic methods to create types from imported source code, etc. For example: https://www.domysee.com/blogposts/ts-mapped-types What TypeScript is missing, come to think of it, is comprehensive documentation of these features outside of compiler release notes. The TS docs website definitely needs work.

jakebailey commented 4 years ago

@LouisStAmour

I am aware of all of these details, but fundamentally that isn't what I meant; I can literally take any working annotated Python code, delete its annotations by hand, and it will run just fine. No babel or tsc to desugar things (??, ?., etc), no magic /// comments, etc. It's just Python. I don't really want to clutter this issue with unrelated stuff, so forgive me if I don't try to point-for-point your response... 🙂

@ethanhs

I think "typed Python vs Python with typing" is a bit out of scope of this proposal, as that is a fundamental change to how we approach typing in Python. I think discussion about that is probably best targeted to typing-sig@python.org.

Certainly; I didn't mean to go too far into that, just enough to point out differences between Python's scenario and TS (at least from my experience with both). Happy to continue on the sig if this comes up again, though I can't say I'm confident that my email client will be compatible with the mailing list...

Therefore, we decided to put the onus on the user to install types for the packages they used, if the types are shipped separately. Then it is "just" looking at site-packages.

This is the bit that makes it difficult for editors which want to "hide" (treat as an implementation detail) the fact that some form of stubbing is in use to users who don't really care. It works great for type checkers (which is why I thought I'd respond when TS was brought up), so I understand why this is the way it is, as it allows for stub packages to coexist within an environment with no extra effort than what pip provides (which is great; I personally think PyPI is the right place for these to go). I'm just thinking about how I'd try to say, handle an average user with no knowledge of stubs using a system-install of some library, where there's not really a good way to figure out how to get a stub when you don't know what the PyPI distribution is called or its version. (Generically mapping a module name to a distribution then to its version I think is an unsolved problem, unless I'm mistaken.)

In its current form, typeshed ends up "solving" the problem accidentally(?) by enforcing a single search path (third_party) in which you can do a normal import resolution. It's not ambiguous who's providing PIL, as typeshed has no notion of a distribution at all, so I don't have to figure out that to get PIL, people probably will actually be installing pillow. Segmenting the repo by distribution is (I think) a necessary change in the long run, but will end up creating work for stub consumers who aren't expecting people to explicitly install things. I know I'm still thinking through how this will all work out.

A metadata file could work. It would probably have to be partially manually edited to take into account packages that ship stubs on their own (numpy-stubs, which has been seeing a lot of work recently, being a notable example).

I'm a bit unclear on this point; are you expecting to have something in typeshed for numpy if numpy-stubs is also going to be maintained as a distinct repo and PyPI package?

One other consideration I have is on the PyPI naming itself; if the naming is just a convention, what is to stop someone from publishing under a name that follows the convention and end up breaking the infrastructure? This is certainly a place where a "protected" namespace in PyPI would be helpful (for comparison, you can't just publish to @types/whatever in npm), but I don't think that's been discussed in a while (pypa/warehouse#4967).

LouisStAmour commented 4 years ago

One other consideration I have is on the PyPI naming itself; if the naming is just a convention, what is to stop someone from publishing under a name that follows the convention and end up breaking the infrastructure? This is certainly a place where a "protected" namespace in PyPI would be helpful (for comparison, you can't just publish to @types/whatever in npm), but I don't think that's been discussed in a while (pypa/warehouse#4967).

Very good point. I think "organisation" structure in npm was definitely a step forward for the community. It also helps organize packages such that the name of the package doesn't need to be prefixed by the organization or project name, while packages are easier to navigate also. The downside is that if anyone can set up an org name, you still have to watch for look-alike org names and duplicate packages under a different org, etc. I'm sure npm has some lessons learned and rules of thumb we could look to for guidance, but npm isn't perfect. https://docs.npmjs.com/using-npm/scope.html are some of the docs for this.

I can literally take any working annotated Python code, delete its annotations by hand, and it will run just fine. No babel or tsc to desugar things (??, ?., etc), no magic /// comments, etc

Quick aside: ?? and ?. are bog-standard ECMAScript language features that have shipped in newer versions of JS in Chrome/Edge/FF by now. https://caniuse.com/#feat=mdn-javascript_operators_nullish_coalescing

// is a comment syntax, so /// is also a comment syntax.

ECMAScript language features are only allowed in TS if you enable the newer libraries/definitions to support them and these nullish operators in particular were only added in the most recent release of TS 3.7 after they hit stage 3 in ECMA standardization.

TS is only "transpiled" with language syntax polyfills if a previous version of JS is chosen as output as a convenience to end users using older platforms so they don't need to use a third-party tool for this. If you target the latest version of ecmascript, your tsc output should be identical to your babel output if it also targets the latest version and avoids inserting other polyfills for cross-browser compatibility. (Babel, like tsc, inserts a number of polyfills under default settings, but unlike tsc it also focuses on cross-browser polyfills and has a few other code rewriting goodies.) To be clear, tsc will type-check as it converts to JS, while Babel will simply strip syntax. Babel historically only processes JS, so it's proof that TS is lightweight type syntax on top of JS.

Edit: My apologies for side tracking the issue topic this much. I do think DefinitelyTyped is worth looking at, though, for this issue's topic. And I'll follow up with @jakebailey directly via Gitter IM perhaps.

Related to building third-party stubs as packages is the ability to support multiple versions of stub syntax, see https://github.com/python/typed_ast/issues/118#issuecomment-623850557 for maybe a clearer explanation of my approach, though that's possibly not the best project to leave this comment on, as a lot of my comment relates to TypeShed more than any given AST (outside of versioning the stub syntax, I suppose).

JukkaL commented 4 years ago

Since the discussion seems to have died down, and nobody seemed to anything significant against the proposal, I and @ilevkivskyi will plan to start working on the implementation soon.

There will be a big switch at some point when we change to the new typeshed directory structure. One of us will write a comment here in advance to announce when we are almost ready for the switch, and we'll show what the result will look like.

jstasiak commented 4 years ago

Is https://github.com/python/typeshed/issues/2491#issuecomment-611607557 the current summary of the proposal?

srittau commented 4 years ago

@JukkaL We should try to close as many PRs as possible before the switch, otherwise they will all fail to apply. Maybe we can also combine this with the black reformatting, which has the same problem (#3058).

JelleZijlstra commented 4 years ago

@JukkaL exciting that this is getting done now!

I can go over all open PRs tonight and hopefully either merge them or get them into a mergeable state myself. Once we have no or few open PRs we can apply Black and get a clean slate for the big refactor.

srittau commented 4 years ago

I went through a few PRs and closed them. I also updated #3329 so it merges cleanly. It will need a final merge after the big reformat, though.

JukkaL commented 4 years ago

@jstasiak Yeah, though a few things I have changed during the discussion:

The stub package name will be types-<foo> for a package that was installed as pip install foo.
The version schema for a stub package will be x.y.z, where x.y is the upstream version, and z is the stub version, starting from 0. (z is reset to 0 when the upstream version is increased.)

@srittau A good point. Since we'll announce our intentions before making the switch, there will also be some time to close PRs just before the switch. I can help with the reviews.

jstasiak commented 4 years ago

Ok, got it. Apologies for a late comment, I've been following the discussion but got overwhelmed by the amount of text at some point and never got back to it (till now). I have two questions:

1.

One other consideration I have is on the PyPI naming itself; if the naming is just a convention, what is to stop someone from publishing under a name that follows the convention and end up breaking the infrastructure? This is certainly a place where a "protected" namespace in PyPI would be helpful (for comparison, you can't just publish to @types/whatever in npm), but I don't think that's been discussed in a while (pypa/warehouse#4967).

Has this been addressed/answered somewhere? I just looked for an answer but failed to find any.

Do your 2/5 year expectations include more packages either becoming typed themselves or including type stubs in their distributions? (so installing package foo installs stubs or typed Python code with the py.typed marker)

JukkaL commented 4 years ago

if the naming is just a convention, what is to stop someone from publishing under a name that follows the convention and end up breaking the infrastructure

I don't think that there's anything we can do right now to prevent this, but we can make our approach resilient to this. Here's an idea about what we could do if this becomes a problem:

Use an alternative prefix if the main prefix is used (e.g., types-ts-<foo>).
Provide a service (outside PyPI) that maps a distribution or a package name to the relevant stub package. So if a user wants the stubs for foolib, the service can tell if the right package if foolib (ships with types), types-foolib, types-ts-foolib, or something else.

The latter would be useful even if we can claim a namespace for typeshed. It could be integrated to PyPI, but it may be easier to deploy it separately, since nobody in the typing community seems to be active in the packaging community.

We can also try to get namespaces implemented at PyPI, but it looks like we don't have a volunteer to implement that.

Do your 2/5 year expectations include more packages either becoming typed themselves or including type stubs in their distributions?

I'd expect both to become more common, but it seems likely that in 2 years many packages will still not include any type information, and typeshed will continue to be important. Packages that include inline types are arguably the best option for users, as the types are more likely to be up-to-date.

ilevkivskyi commented 4 years ago

Now that we are moving towards implementing this proposal, let's discuss what would be the best way to implement the auto-uploading (so that it is robust, secure, but still relatively simple). Here is one possible idea:

We will have a bot account on PyPI (e.g. typeshed-bot-1), only very limited number of people will have a password for it.
We will have a separate private repository where upload scripts (like https://github.com/python/typeshed/pull/4291 and https://github.com/python/typeshed/pull/4295) and a lock file (see below) will live. Again, very few people will have access to it. The PyPI bot account credentials will be stored there.
We will have a simple service running in the cloud (DigitalOcean or Amazon EC2) that will run two cron jobs: auto-upload every 30 minutes, and health-check every hour.
Auto-upload will pull typeshed, pull upload script repo, then put something like uploading:<typeshed master hash> in the lock file and push it.
Check which distributions changed between current and previous values of typesed hash, and upload new versions for those that changed. After all uploads are done put something like done:<typeshed master hash> in the lock file and push it.
Health check script will check whether uploads are progressing, and if something is wrong will send emails to typeshed maintainers.

This way we can achieve two goals:

Hove some infrastructure that we control and that has some basic resilience w.r.t. various outages.
We can give write permissions to typeshed to more people (assuming it will become a very high traffic repo) to reduce review burden without compromising security.

@JelleZijlstra @srittau @JukkaL any comments or alternative ideas?

JukkaL commented 4 years ago

The overall proposal looks good to me. I have some feedback on details below.

The PyPI bot account credentials will be stored there [private repository].

I think that storing passwords in a repository is against best practices. We may want to minimize the number of unencrypted copies of any passwords. Also, somebody could accidentally push the repo to another public repository, GitHub could be compromised, etc.

It may be better to only store an upload token on the server (in a config file not committed to a repo), and store bot account password in (a few) people's individual password managers. This way it will be less likely that people will accidentally leak the password, and there's only one copy of the upload token. If somebody gets access to the server, they won't be able to change the PyPI account password (though they can upload packages).

Would we also give the maintainers ssh access to the upload server?

We will have a separate private repository where upload scripts (like #4291 and #4295) and a lock file (see below) will live.

What about having two repositories? The server shouldn't need to have write access to scripts, but it needs to have write access to the lock file (or should we call it a log file?).

Auto-upload will pull typeshed, pull upload script repo

Another security consideration: we may want to avoid automatic pulling of the script repo, so that there's some extra safety in case the repository gets compromised. When the scripts are updated, we'd need to manually pull the repo, but I think that it should be acceptable. This way if we commit broken scripts, we can easily revert to an older revision, by just checking out an earlier revision.

then put something like uploading: in the lock file and push it.

Additionally, we'd probably want to log a timestamp, and maybe an identifier that specifies the host that does the upload (e.g. hostname), in preparation for having a spare server. Maybe also add list of packages and versions that will be uploaded, to simplify debugging.

ilevkivskyi commented 4 years ago

I have some feedback on details below.

Thanks! These are all good ideas. @JelleZijlstra @srittau what do you think?

srittau commented 4 years ago

Sounds good to me. Just one thing:

We will have a simple service running in the cloud (DigitalOcean or Amazon EC2) that will run two cron jobs: auto-upload every 30 minutes, and health-check every hour.

I think we can do this with GitHub Actions. Actions support cron jobs (see this workflow for an example) and can do some magic with GITHUB_TOKEN. This way we can utilize the GitHub secrets mechanism, don't need to involve another account, and have less credentials overall.

ilevkivskyi commented 4 years ago

Cool! I didn't know GitHub Actions support cron. I think it indeed looks simpler with it.

JelleZijlstra commented 4 years ago

This sounds good to me. It would be great if we can do it all through GitHub actions.

JukkaL commented 4 years ago

I also like the idea of using GitHub actions.

ilevkivskyi commented 3 years ago

@srittau @JelleZijlstra @rchen152 @erictraut @sinancepel @dkgi

Me and @JukkaL are going to start actually implementing the plan outlined above in https://github.com/python/typeshed/issues/2491#issuecomment-611607557 (with corrections and clarifications outlined in further comments). An important thing to note is that we are going to change the typeshed directory structure (see the above comments for details). If you have any objections/comments/etc., please let us know. If we don't hear any objections in 10 days, we are going to implement this.

srittau commented 3 years ago

That's great news! Could you briefly summarize the updated plan, i.e. the final directory structure etc.?

ilevkivskyi commented 3 years ago

The new directory structure will be like this:

stdlib/
    VERSIONS  # First Python version where each module was added (default is Python 2.7)
    python2/  # Only those where separate stubs are needed
        builtins.pyi
        ...
    builtins.pyi
    ...
stubs/
    six/  # <- distribution name, as on PyPI
        python2/  # optional, only if separate stubs are needed.
            six/  # <- package/module name, may be several per distribution
                __init__.pyi
                moves.pyi
                ...
        six/
            __init__.pyi
            moves.pyi
           ...
        METADATA.toml  # Includes various info like distribution version, dependencies on other stubs
    ...
...

So essentially each directory in stubs/ will map one-to-one to a PyPI distribution (and will contain all info to generate and upload a wheel), plus separately the whole of standard library in its own directory stdlib/.

For more details it is probably best to just read the code: https://github.com/python/typeshed/blob/master/scripts/migrate_script.py

gvanrossum commented 3 years ago

(I think you meant builtins.pyi, not builtinst.pyi in the stdlib/python2/ subtree, right?)

ilevkivskyi commented 3 years ago

(Yes, thanks, fixed.)

hauntsaninja commented 3 years ago

In earlier comments we were thinking of using 2 instead of python2 which has the minor advantage that it's not a legal identifier (there's at least one PyPI package this would be an issue for: https://pypi.org/project/python2). If we like a longer name, maybe python-2?

One thing that's unclear to me based on the script if there's a way for a third party stub to have a minimum supported Python version, e.g 3.8 if it wanted to use PEP 570 positional args.

jakebailey commented 3 years ago

I'm still a bit iffy on the changes from the POV that for editor use (pylance/pyright which have editor integration), we still want to be able to bundle all of these and the reorg will make it questionable how to figure out which stubs are supposed to be used for a given module.

In https://github.com/python/typeshed/issues/2491#issuecomment-623789835 I mentioned that a benefit of the current layout is that it disambiguates things such that say looking for PIL you could just get PIL and move on, but if we want to to continue bundling with a new layout, we have to know which distribution a particular module is a part of (e.g. "actually it's pillow"), which is not always obvious (and there's no general way to ask "where did this module come from?" or "what version library is this?").

RE: why we want to continue bundling; most people don't have any knowledge about stubs but we still want to make use of them to better improve editor features without having every user know they have to install a bunch of stub packages to make things work as they'd expect. It may be expected for a language like TS that "of course I need my @types packages to get features", but in Python the expectations of users are quite different where the whole shebang is expected to work for all code. We want to ensure things are as smooth out of the box as possible for existing projects, even for those who don't know about (and in some cases even dislike) typing.

The other piece I don't see much traction on is the whole pip/PyPI namespacing thing; I assume that it's still planned for distributions here to be published with an ad-hoc prefix convention? Given anyone can publish on PyPI with any name, has there been any thoughts about having something to prevent unwanted publishes to that prefix so tooling can more reliably trust that the stubs they may suggest aren't problematic? (I've been thinking for a while about tooling being able to fetch stubs from PyPI directly without installing them to make use of types and reduce friction, but that does also need metadata / a consistent way to know where to look).

I just want to be careful, as type stubs are increasingly a critical part of editor tooling affecting a much wider userbase and not just those who want to statically check their code (and have full knowledge about needing stubs for things to work). This is in no way a "don't do this", just that there are some extra angles. 🙂

srittau commented 3 years ago

One thing that's unclear to me based on the script if there's a way for a third party stub to have a minimum supported Python version, e.g 3.8 if it wanted to use PEP 570 positional args.

I don't think it's possible to base the supported feature set on the Python version the type checker is running under. This is reason the type stub PEP that is currently under preparation will describe a feature set that is currently supported by type checkers. We should limit stubs in typeshed to this (evolving) feature set.

hauntsaninja commented 3 years ago

If the flow becomes pip install mypy; pip install stubs-somepackage, couldn't it be accomplished by setting python_requires on the stubs-somepackage package? Anyway, you're right that it'll make life easier to mandate a lowest common denominator feature set.

srittau commented 3 years ago

I'm still a bit iffy on the changes from the POV that for editor use (pylance/pyright which have editor integration), we still want to be able to bundle all of these and the reorg will make it questionable how to figure out which stubs are supposed to be used for a given module.

In https://github.com/python/typeshed/issues/2491#issuecomment-623789835 I mentioned that a benefit of the current layout is that it disambiguates things such that say looking for PIL you could just get PIL and move on, but if we want to to continue bundling with a new layout, we have to know which distribution a particular module is a part of (e.g. "actually it's pillow"), which is not always obvious (and there's no general way to ask "where did this module come from?" or "what version library is this?").

Unfortunately, not including the distribution is more problematic in my opinion: It prevents us from ever including two ~~packages~~ distributions with conflicting module names and it prevents us from sanely handling ~~packages~~ distributions using multiple modules. On the other hand, finding the correct module path without knowing the package's names is possible using a simple glob. Which leaves the necessity to disambiguate. This could for example be done by looking at the installed packages or a requirements.txt file. But in my opinion, this is a much better problem to have than being unable to have two conflicting packages at all.

The other piece I don't see much traction on is the whole pip/PyPI namespacing thing; I assume that it's still planned for distributions here to be published with an ad-hoc prefix convention? Given anyone can publish on PyPI with any name, has there been any thoughts about having something to prevent unwanted publishes to that prefix so tooling can more reliably trust that the stubs they may suggest aren't problematic? (I've been thinking for a while about tooling being able to fetch stubs from PyPI directly without installing them to make use of types and reduce friction, but that does also need metadata / a consistent way to know where to look).

This is still of concern to me. There have been some suggestions made to enhance pypi with namespace support, but as far as I know so far nothing has been implemented.

srittau commented 3 years ago

If the flow becomes pip install mypy; pip install stubs-somepackage, couldn't it be accomplished by setting python_requires on the stubs-somepackage package? Anyway, you're right that it'll make life easier to mandate a lowest common denominator feature set.

What I forgot to mention: The problem with including such a restriction is that it would only apply to mypy, not any other type checker. Also, it's not absolute as it would only be related to syntax, while even mypy can use non-syntax features from future Python versions.

jakebailey commented 3 years ago

Unfortunately, not including the distribution is more problematic in my opinion: It prevents us from ever including two packages with conflicting module names and it prevents us from sanely handling packages using multiple modules. On the other hand, finding the correct module path without knowing the package's names is possible using a simple glob. Which leaves the necessity to disambiguate. This could for example be done by looking at the installed packages or a requirements.txt file. But in my opinion, this is a much better problem to have than being unable to have two conflicting packages at all.

Absolutely, I think that this is a step forward. After having a chat about this, I think the main thing is about choosing which of the bundled stub versions is the "right" one; at the moment with typeshed it's always the latest (by nature of not having any other method), therefore a type checker / editor that wants to have some default behavior could choose the newest one (and of course pick the installed one out of site-packages if available instead, which we do now). It'd still be nice to figure out what the user has installed though, but that's not exactly a concern of typeshed, just a general problem within Python envs themselves.

The "glob method" works so long as there doesn't end up a situation such that two distributions ship the same modules; this is true now but that'd be a consideration for the future. There's also the issue of namespace packages where multiple distributions provide part of the same module tree, but I'm not aware of any that are in typeshed either. IIRC the Azure SDK uses this pattern but ships their own types inline where possible.

rchen152 commented 3 years ago

@ilevkivskyi Thanks for the heads up! I have no objections to the change in typeshed structure - most of the changes (like adding a VERSIONS file to stdlib/) seem like clear improvements to me.

Akuli commented 3 years ago

I'm not sure if I like stubs and stdlib as opposed to third_party and stdlib. It might confuse people who are new to typeshed and want to fix stdlib stubs. How should they know that stdlib stubs are in stdlib and not in stubs, since both seem equally likely?

JukkaL commented 3 years ago

@hauntsaninja

If we like a longer name, maybe python-2?

I'd be okay with this. Alternatively, we could create a convention for subdirectories that aren't Python packages. We could use this for additional things in the future, such as for tests, which might conflict with package names. A simple idea would be to use some character as a prefix, such as @python2 (also @tests, etc.).

@jakebailey

After having a chat about this, I think the main thing is about choosing which of the bundled stub versions is the "right" one; at the moment with typeshed it's always the latest (by nature of not having any other method), therefore a type checker / editor that wants to have some default behavior could choose the newest one (and of course pick the installed one out of site-packages if available instead, which we do now).

One option would be to always specify one alternative as the "primary" in metadata if there is ambiguity. We could have a test that ensures that whenever there are two distributions with the same package name (and they aren't using non-overlapping namespace packages), one them would have to be marked as the default/primary. I expect that this would only be needed pretty rarely, and if it's needed, one of the options is likely much more popular than the others, making it clear which should be the primary one.

@Akuli

I'm not sure if I like stubs and stdlib as opposed to third_party and stdlib. It might confuse people who are new to typeshed and want to fix stdlib stubs. How should they know that stdlib stubs are in stdlib and not in stubs, since both seem equally likely?

The proposal is using just stubs for third-party packages, since in the long term I'd expect that the vast majority of stubs are for third-party packages, making typeshed primarily a repository of stubs for third-party libraries. For this reason I'd avoid the somewhat awkward name third_party. My preference is to make the common case short and simple.

stubs and stdlib would likely be sorted next to each other in a directory listing, I think, so it would be hard to miss the other one. We could rename stdlib to stdlib-stubs or similar to make it even clearer, but I'm not sure if I like this idea.

jakebailey commented 3 years ago

One option would be to always specify one alternative as the "primary" in metadata if there is ambiguity. We could have a test that ensures that whenever there are two distributions with the same package name (and they aren't using non-overlapping namespace packages), one them would have to be marked as the default/primary. I expect that this would only be needed pretty rarely, and if it's needed, one of the options is likely much more popular than the others, making it clear which should be the primary one.

@JukkaL

This was specifically about versioning within each distribution and picking one of those, as I believe the intent is to be able to publish more than one for the same one, but another field that says "I'm the real PIL" (not that PIL is in typeshed, of course) could be handy.

ilevkivskyi commented 3 years ago

OK, it is almost 10 days since the previous announcement. IIUC there are no objections, so we can start working on this now. I will keep this issue posted on the progress. One of the important steps would be to merge as many PRs as possible before the directory reshuffle, since all PRs will turn into big merge conflicts after that. We can agree on a day (for example next week), when we will actually run the script and land the change.

@srittau @JelleZijlstra do you have any preferences on the date?

srittau commented 3 years ago

I will work on PRs over the weekend. So anything after that sounds fine to me.

ethanhs commented 3 years ago

Oh I realized I've been sitting on types-stdlib since when I was initially planning on working on this. I've invited @JukkaL and @ilevkivskyi as owners. Feel free to invite others/do what you wish, I unfortunately don't have time to work on much typing work, at least for a while :/

ilevkivskyi commented 3 years ago

@ethanhs Thanks!

In the meantime I created a separate repo for auto-upload scripts and actions. For security reasons it will have a more limited write access: https://github.com/typeshed-internal/stub_uploader. I already added draft versions of couple scripts there.

ilevkivskyi commented 3 years ago

It looks like we are almost ready to perform the directory restructure (and have some basic typeshed -> PyPI integration ready). One thing is that fixing some tests may be tricky, a possible way to simplify this may be to temporary disable some tests initially, and then gradually fix/re-enable them. This means we will likely need a no merge window.

A possible schedule can be like this:

[x] Weekend, Jan 23-24: Merge any outstanding PRs, do not perform any other user PR merges until Monday, Feb 1
[x] Monday, Jan 25: Run restructure script and make a big PR
[x] Tuesday, Jan 26: Fix some tests, and land the PR with other tests disabled
[x] Wednesday, Jan 27: Test and enable typeshed -> PyPI auto-upload
[x] Rest of the week: Gradually fix and re-enable most tests
[x] Monday, Feb 1: start merging user PRs again

@JelleZijlstra @srittau @hauntsaninja what do you think? Most of the work can be done by me and @JukkaL, but we can use your help fixing some of the tests.

srittau commented 3 years ago

Sounds fine by me. I have to see how much time I can dedicate next week, but I'll see what I can do.

hauntsaninja commented 3 years ago

I don't have much time time this week, but as long as mypy supports --custom-typeshed-dir for the new typeshed format most tests should be easy to get working again. The only tricky one beyond that is the pytype test.

rchen152 commented 3 years ago

I can help with fixing the pytype test.

ilevkivskyi commented 3 years ago

@hauntsaninja

as long as mypy supports --custom-typeshed-dir for the new typeshed format most tests should be easy to get working again.

IIUC --custom-typeshed-dir can be only supported for stdlib, not for third party stubs. Would this be enough to fix the tests?

hauntsaninja commented 3 years ago

Yeah, that's probably good enough. mypy_primer no longer working on third party stubs will probably be the biggest CI difference

JelleZijlstra commented 3 years ago

Perhaps mypy-primer could use $MYPYPATH to run on third-party packages.

srittau commented 3 years ago

I have now merged or closed all PRs except two:

4776 needs an update after the reshuffling, but should have no conflicts.
4848 is my own PR that I will update after the reshuffling.

python / typeshed