ambv commented 9 years ago

The ideal situation is where a file is annotated. The other obvious choice if there is a module.pyi stub file alongside module.py. However, since package maintainers are free not to add type hinting to their packages, there has to be support for third-party stubs installable by pip from PyPI. This opens the following questions:

should there be a recommended naming scheme for those stubs? "No, a package maintainer can always trump third-party implementations by including types in his own module.".
should there be a default install place for those stubs? "Yes, this makes the type checker's stub discovery much easier. We are proposing using data_files=('share/typehinting/', pathlib.Path(SRC_PATH).glob('**/*.pyi')) in setup.py. We are proposing to add a setup.cfg hook to do the right thing automatically."
should third-party stubs be versioned? "Yes, they should be versioned as the lowest version of the source package that they are describing. Example: FooPackage has versions 1.1, 1.2, 1.3, 2.0, 2.1, 2.2, 2.3, 2.4. There are API changes in versions 1.1, 2.0 and 2.4. We need only those three versions of stubs, in which case the user should install the closest lower version of stubs available. If the user lives dangerously and does not pin versions, running both the source package and the stubs as latest will generally work (as long as the stubs are regularly updated)."
as mentioned previously, the package itself might provide .pyi files. There will be a directory (shared/typehinting/) where third-party packages can put their implementation of stubs for external packages. Also, tools like type checkers or IDEs might provide their own custom paths for stubs they themselves populate, etc. The type checker itself is expected to only load one .pyi file per corresponding *.py module.

ambv commented 9 years ago

Ended up recommending shared/typehints/python3.5, etc. since:

a different Python version is effectively a diferent environment
stub files for the same libraries on a different Python version might be different

@JukkaL, I've seen https://github.com/JukkaL/mypy/tree/master/stubs has a similar concept with distinct stubs per Python version. Do you see any problem with the suggestion?

JukkaL commented 9 years ago

My plan is actually to get rid of the separate stub directories for Python 3.4 etc. in mypy. The reason is that it makes stubs a more difficult to maintain with marginal benefits. Having separate stubs for Python 2 and 3 would be useful, however, since they are often significantly different. My plan is to only have stubs/python2 and stubs/python3 (or similar) in mypy.

Also, should the directory be shared/python/typehints/... instead (i.e., with /python/), or maybe shared/pytypehints/...?

Potentially we could recommend something like __minpyversion__ = '3.4' in the stubs to specify the minimum supported Python version.

Also, we discussed the possibility of having a single stub file that works in all Python versions (2.x and 3.x). It would be nice if we didn't have to maintain two copies of such a stub file, as these could easily get out of sync.

sffjunkie commented 9 years ago

You could do something like the following which allows you to add a more specific stub file for a specific Python version

import os
import sys

class FSUnion():
    def __init__(self, fsroot):
        self._dirs = []
        vi = [str(elem) for elem in sys.version_info[:2]]
        self._dirs.append(os.path.join(fsroot, vi[0], vi[1]))
        self._dirs.append(os.path.join(fsroot, vi[0]))
        self._dirs.append(fsroot)

    def __getitem__(self, module_name:str) -> str:
        for d in self._dirs:
            stub_filename = os.path.join(d, '{}.pyi'.format(module_name))
            if os.path.exists(stub_filename):
                return stub_filename

        raise KeyError('{}: Stub file for {} not found'.format(self.__class__.__name__, module_name))

fsu = FSUnion('typehints')
print(fsu['datetime'])
print(fsu['sys'])
print(fsu['os'])
print(fsu['notthere'])

Which, with the following file structure

typehints
    os.pyi
    sys.pyi
    datetime.pyi
    /3
        os.pyi
        /5
            sys.pyi

prints

typehints\datetime.pyi
typehints\3\5\sys.pyi
typehints\3\os.pyi
Traceback (most recent call last):
  File "fsunion.py", line 26, in <module>
    print(fsu['notthere'])
  File "fsunion.py", line 21, in __getitem__
    raise KeyError('{}: Stub file for {} not found'.format(self.__class__.__name__, module_name))
KeyError: 'FSUnion: Stub file for notthere not found'

ambv commented 9 years ago

@JukkaL, so if we want to use PyPI and pip, we have to have pythonX.Y. The reason for that is as follows:

a user has python3.3, python3.4 and python3.5 installed on his machine
for some reason he doesn't use virtual environments, he is free to ignore them
he uses a hypothetical package called "packagify" that since version 2.0 switched to "yield from", and since version 2.1 switched to type annotations in source files
if he does pip-3.3 install packagify==1.0 packagify-types==1.0 and then pip-3.4 install packagify==2.0 packagify-types==2.0 and then pip-3.5 install packagify==2.1 (stubs not needed in the last case), he expects types to be valid for each version.

Moreover, as we talked about this with @vlasovskikh, if a construct is described in the stubs, we trust that it is correct. For instance, if functools.pyi has def singledispatch(), then the type checker assumes it exists via some runtime magic, even if it's not there in the source. This would be incorrect for Python 3.3 and a nice bug to catch, actually. So you'd need to have the the hypothetical "stdlib-types" specify "minversion" in functools.pyi. But at this point the stubs stop being usable for Python 3.3 and lower. So you end up introducing versioning in the package name. Back to square one and with more hairy workarounds.

gvanrossum commented 9 years ago

I can't tell from this discussion if this requires PEP changes or not. I am labeling this as enhancement which I will interpret as "in the future, maybe", i.e. "no need to change the PEP or typing.py now".

gvanrossum commented 9 years ago

(Sorry, didn't mean to close.)

o11c commented 9 years ago

FYI, I have thought a lot about this, and I think the best path forward is to extend the if typing.PY3 sort of logic.

Currently mypy's stubs DTWT for python 3.2/3.3, because the stubs for modules that were present in 3.2 include functions that were only added in 3.4. And there are even cases where functions were removed in later python versions.

gvanrossum commented 9 years ago

Actually, Mark Shannon made me remove typing.PY3 and other platform checks. Instead, type checkers should learn how typical programs check for platforms. So extending typing.PY3 is not an option.

(I think this issue is actually about something else, so I won't close it.)

o11c commented 9 years ago

@gvanrossum

In that case, a single stub file can write if sys.version_info[:2] >= (3, 4) to expose different APIs and we can mandate that a checker supports that (as opposed to just a major version).

You're right that it's not the direct focus of this PR, but the choosing in-file vs out-of-file multiversioning will affect the answer for what the paths should be.

That said, I am not looking forward to implementing the "calculate what sys.path would be for a different python version than what we're currently using" logic. But since supporting in-file stubs is the ideal case, we can't avoid that.

JelleZijlstra commented 7 years ago

I'm interested in moving this issue forward, because it appears to be a somewhat common problem for mypy users that there is no standard place to install third-party stubs outside of typeshed. You can manually install stubs into some directory and set $MYPYPATH, but that is fragile and not portable. By fixing this issue, we could also fix python/typeshed#153, because third-party modules could now come with version-specific stubs.

I think Łukasz's approach of installing stubs using setup(data_files=...) is basically sound, but there's one complication: Type checkers may not have access to the Python binary that is used to run the code they're checking, so they don't know where setup() installed the stubs. I think mypy can get by with something like this for getting 2.7 stubs:

Try to run python2.7 -c 'import sys; print(sys.exec_prefix)' and use the output joined with shared/type_hinting/python2.7 as an additional search path for stubs.
If that doesn't work (e.g., because the code being checked runs in a virtualenv), it is the user's responsibility to use $MYPYPATH or the mypy_path= config option to teach mypy to find the stubs.
Perhaps mypy could also provide an option like --check-using-python /path/to/python/binary. If this is given, it can query the binary for its exec_prefix and find the stubs that way.

Other type checkers may provide additional implementation-specific ways to find the shared stub directory.

If we go with this approach, how should it be codified? I could write a new PEP, but perhaps this is small enough that it can just go as a new section into PEP 484.

I have a proof-of-concept implementation in https://github.com/JelleZijlstra/mypy/tree/stubdir and a library using the functionality at https://github.com/JelleZijlstra/sqlalchemy-stubs.

cc @JukkaL @vlasovskikh @matthiaskramm

matthiaskramm commented 7 years ago

I like this approach, but I'd propose using the same naming scheme for type-hinting that we have in typeshed. (I.e., use type-hinting/2.7, not type-hinting/python2.7)

gvanrossum commented 7 years ago

The PEP 484 text on this issue is pretty vague, and I propose to write a separate PEP, just so the design is clear and we can easily get feedback from people maintaining various libraries or using various platforms. Can you start a PEP for this purpose?

JelleZijlstra commented 7 years ago

Yes, I'll do that.

JelleZijlstra commented 7 years ago

Actually, I just re-read PEP 484, and at https://www.python.org/dev/peps/pep-0484/#storing-and-distributing-stub-files it has pretty much the same specification that's being proposed here. However, that text seems to implicitly assume type checkers run the same Python version they're checking, which is not currently true. So there's still value in a new PEP to specify in more detail how stub files should be distributed.

I'm going to work on the new PEP at https://github.com/JelleZijlstra/peps/tree/distributingstubs. I'll post here again to ask for feedback when I'm happy with the PEP text as written, but in the meantime I'm open to any suggestions on what the PEP should cover and how the system should work.

gvanrossum commented 7 years ago

Great! I feel that the current text in the PEP falls short in giving an algorithm for a type checker for finding stubs, even given a Python executable (to which it can feed arbitrary code, within reason). The PEP currently defers to PYTHONPATH (and where is the shared directory rooted?).

ethanhs commented 7 years ago

This is something I think would be really important to have! I was considering writing up a proposal myself. If you need help with the PEP, I'd be happy to help in reviewing or any other way I can!

ambv commented 7 years ago

Mypy documentation is currently warning against using site-packages with MYPYPATH. Rightfully so, in this case it's also type checking all third-party code your application is importing, which the user most likely doesn't want. Example: add site-packages to your MYPYPATH and just import setuptools somewhere in your code to see tens of confusing errors.

However, the more I think about it in context of shipping annotated libraries or .pyi files for user consumption, the more I think we should just use site-packages. Why?

For the user, it's the obvious place to look for annotated code and stubs, if they are provided by the author.
For the author, the default action of just submitting a package to PyPI is enough to make it available to type checkers. It does the correct thing by default.

There are three issues with this that I can identify:

Third-party libraries which aren't annotated at all often generate internal type errors.
Third-party libraries which are annotated sometimes still generate internal type errors.
Shipping .pyi files for production libraries is wasting space.

The first two can be solved by teaching the type checker that errors in site-packages should be silenced by default unless really asked for. Essentially a less hacky form of this: MYPYPATH=.../lib/python3.6/site-packages mypy my_project | grep -v "/site-packages/"

As an option we can consider also ignoring .py files altogether if they don't contain a single type annotation/type comment.

The size aspect of a library installing .pyi files is a nit. Embedded type annotations already have weight, so moving them to a separate .pyi file is a miniscule cost. If the author of a given library really cares about every last byte, we can recommend putting typing as a separate extra feature in setup.py (so the user will have to pip install some_library[typing] to get it).

cc @JelleZijlstra

gvanrossum commented 7 years ago

Mypy has a feature --follow-imports=silent which lets it analyze code that is needed to satisfy imports but not specified on the command line. This is indeed just a flag to suppress errors from those source files. However what's also needed is a way to decide whether to use the site-packages code for a given library or the typeshed stubs, if the latter exist. I think one of the problems with the way mypy currently interprets MYPYPATH is that everything it finds there has higher priority than typeshed. But (except in rare cases) when stubs exist for a 3rd party module, mypy should prefer those stubs over the code in site-packages. Hopefully this can be solved through some kind of improvement to mypy's search path -- maybe MYPYPATH can include a token indicating where typeshed should be searched, so you could write e.g. MYPYPATH=blah:blahblah:<typeshed>:/path/to/site-packages.

I think it would also be good to be able to point a type checker to a particular module in site-packages without implying that site-packages itself should go on the search path (this is probably one of the less-understood features of mypy's search path -- for any file or folder specified on the command line it implicitly adds the containing folder to the search path, before MYPYPATH).

ethanhs commented 7 years ago

I agree that people should be able to point to a particular module in site packages, instead of all of it. Im concerned if we start trying to deal with files that are untyped and not verified against Mypy it may crash, which would obviously be a problem, as the user would have to uninstall or modify the installed library, which isn't a great experience either way.

gvanrossum commented 7 years ago

Yes, for example I had a spurious install of typing.py in my site-packages and that totally crashed mypy.

I propose a concrete test case: there are stubs in typeshed for werkzeug and jinja2 but not for flask. It should be possible to typecheck site-packages/flask while using the typeshed stubs for werkzeug and jinja2. Currently the only way I can think of making that work is making a copy of the flask package (or a symlink or take the source tree) and point mypy at that -- but I think it should be possible to just do it using mypy $VIRTUAL_ENV/lib/python3.6/site-packages/flask or, even better, mypy -m flask.

ethanhs commented 7 years ago

I think packages should have stubs in a subdirectory of their install. This adds little complexity to the current install process, and a special file in that directory could indicate to use the sources for annotations (instead of stubs). As Guido said we typecheck typeshed stubs if they are available. If it isn't we can check if they opted into type checking their code in the module. Furthermore, these files would be removed with the package on uninstall, and the version of stubs would be tightly locked to the package version.

gvanrossum commented 7 years ago

I think packages should have stubs in a subdirectory of their install.

Yes, if the package authors care about stubs. Too often they don't (yet) or don't have the resources to bundle stubs, hence most packages have stubs living in typeshed. Ideally that wouldn't be necessary, but there's currently 19k lines in the typeshed/third_party folder, and it'll take a long time to move that out. (IIRC TypeScript has the same problem and deals with it the same way.)

JukkaL commented 7 years ago

As an option we can consider also ignoring .py files altogether if they don't contain a single type annotation/type comment.

This doesn't work reliably, since there are a non-trivial number of modules that type check fine and don't require any annotations. Example:

CONSTANT1 = 0
CONSTANT2 = 1
....

Another example:

# just export some stuff
from internal_module import x, y, z

asvetlov commented 7 years ago

My personal reasoning is: as author of several libraries (aiohtttp and others) I'd like to provide typing info to my users. My libraries could live without static type checkers -- we have many other validators and comprehensive unit test suite. But I want to provide type information to my libraries users. I pretty sure they will never tune MYPYPATH or do something special -- they want just pip install aiohttp and get type hinting for aiohttp itself along with yarl and multidict libraries.

I want ship typing info inside my packages -- synchronizing typeshed and package is tedious and potentially error prone. Also for small libraries with intensive usage of C Accelerators like multidict and yarl providing stub files makes sense but for big packages like aiohttp easier to maintain embedded type annotations, thus I need a solution for providing types to my users for both cases in easy way.

If mypy will require some extra steps for me as library author (copy stub files into separate directory, run a tool for extracting stubs from annotated source code) -- I could live with it but prefer to avoid this steps. But for library user the library types should be accessible just after pip install library without passing extra params into mypy. Otherwise the whole idea about types looks not very useful.

If trivial solution proposed by @ambv doesn't work -- we could consider to supporting a package's label for letting mypy know the package provides type info supported by package's author. Say, it could be a special file named typing.enabled pushed alongside with toplevel __init__.py or special comment in the __init__.py itself.

P.S. I understand your focus on supporting separate stub libs for existing libraries but there is another demand: support packages with pre-built type info provided by its authors. It's not my only opinion, I heard concerns like described above several times during last months.

JukkaL commented 7 years ago

@asvetlov I agree that this is an important concern. Your suggestion is reasonable and similar to what I've been thinking. Here's a summary how mypy could support his:

Include Python sys.path in the mypy search path by default as a special search entry (more about this below).
Provide a mypy command-line option for specifying the Python interpreter which we use to determine the search path. By default, use the same one that is used to run mypy.
Only process modules or packages found using the Python search path if they declare support for static types somehow. Otherwise, fall back to typeshed.
Don't report any errors in modules found using the Python search path. Possible exceptions include serious errors such as parse errors.
Provide a mypy command line option for ignoring sys.path (which would result in the current behavior). Maybe also make it possible to ignore only specific modules in sys.path.

I also agree that it should also be possible for package maintainers to bundle separate .pyi files. This could be nice for maintainers that don't want to add annotations to their implementation, in addition to packages that use C extensions.

There are still some open questions:

What if a module declares support of static typing but doesn't have any annotations? Should this be disallowed or not recommended? It's possible that typeshed has more complete types in a case like this.
What's the best way for a module to declare that they support static typing?
What's the best way to bundle .pyi files?

ethanhs commented 7 years ago

I agree that plan is very good. To give thoughts on your open questions:

I think it would make sense to have 3 levels packages can declare:

type unsafe package, the default state, packages not specifying an option have this level. Type checkers may opt into checking these, but it is recommended to require manual instruction to do so. This allows people to whitelist packages they have tested themselves which haven't declared what level they are.
stubbed package, this is self explanatory, the stubs would reside ~~in some subdirectory of~~ alongside the installed source directory ~~eg my_package/typing-stubs/~~ or something like that.
typed package, the typing is in the source, treat it much like a module in the system path.

As for how this information is included in the distribution, I think we would need to either hack something together with files or information such as that, or integrate in the current metadata of packages (I think if we could add classifiers to the packages about this, it would be simple to get that metadata). Otherwise we should probably get ideas for the best way to add metadata on distutils-sig.

Edited to 3 levels as suggested by David.

asvetlov commented 7 years ago

If I provide .pyi files for my package the most native place for its is the folder for source .py files, isn't it? Mypy uses it for user files already, why not extend the rule for installed libraries. As library author I could just put .pyi alongside with my sources and include them into distribution. Moreover I'd like to mix embedded annotations and stub files sometimes. Say use embedded approach for python modules but provide stubs for C Extensions.

ilevkivskyi commented 7 years ago

What if a module declares support of static typing but doesn't have any annotations? Should this be disallowed or not recommended? It's possible that typeshed has more complete types in a case like this.

I don't think this should be disallowed, but certainly we could recommend declaring support only if the packaged annotations are better than the typeshed stubs.

What's the best way to bundle .pyi files?

I think I am with @asvetlov here, we could just look alongside the source files. Later we could additionally support some attached metadata in the package that will point to the folder with stubs, or search in a folder with a special name, like <package root>/stubs.

What's the best way for a module to declare that they support static typing?

We can consider packages that bundle .pyi files as supporting typing. For inline annotations it is less clear (as discussed yesterday), but as a temporary solution one can just run stubgen and include the generated .pyi (there is a PR https://github.com/python/mypy/pull/3169 to preserve annotations in stubgen).

In general, there are two situations with providing annotations:

A package maintainer would like to provide annotations.
Someone else wants to do this.

For the second situation we already have a reasonable option - contribute to typeshed. I think we should encourage the first one, we could start from something simple, for example look for .pyi files alongside sources, and use them if found. (There might be a concern that this will encourage using stubs instead of inline annotations, but I think it is up to the package maintainer to choose the workflow - either maintain separate stub files, or use inline annotations and run stubgen before a release).

gvanrossum commented 7 years ago

I like Jukka's suggestions -- a PyPI package should be able to declare its stance regarding type checks.

I kind of hope that there need only be two levels, "never heard of types" and "package contains type information". In the latter case the checker should search the package for .pyi files and fall back to .py files. For modules that are C extensions, a .pyi file is needed; if none is found the module is treated as missing. Also if a package declares it contains type information, this should completely override typeshed for that package (with perhaps a way for the user to override this decision, per package).

I think the metadata should go into PKG-INFO, following PEP 345, assuming it allows extensions and it's easy to add to this file using an appropriate clause in setup.cfg or setup.py. I know nothing about this format though. (There's also a newer PEP, 426, but it's still a draft, and apparently deferred.)

I think there should be explicit metadata to indicate that a package has bundled type info -- it would be expensive to search site-packages for .pyi files, and even more expensive to look in the .py files for annotations (and not even correct, probably). (Sorry, @asvetlov.)

Regarding which sys.path to use, I think in the case of mypy, it shouldn't actually default to the Python interpreter used to run mypy; it should default to the Python version specified or implied on the command line, and presume $PATH can find that Python version. (With a warning if it can't find one, and an option to specify either the Python interpreter or the site-packages directory manually.)

There's an additional wart here in that type checkers evolve, sometimes faster than PyPI packages, and the type annotations bundled with a package might trigger errors or even crashes with a newer version of a type checker. There may also be type checker flags that trigger errors (e.g. mypy --strict is a pretty high bar to pass, and there are options that set an even higher bar, like --warn-unused-ignores). I'm not sure what to do about crashes except making that the checker's problem; but for errors, Jukka's suggestion of suppressing all but the gravest errors by default seems sensible (in mypy, that would perhaps be non-blocker errors). More user control would be optional. Ultimately a dedicated user could take complete control by setting up a custom virtualenv and pointing the checker there (perhaps combined with a custom typeshed).

vlasovskikh commented 7 years ago

FYI PyCharm searches for type hints in .pyi and .py files on sys.path (including site-packages), since according to the current version of PEP 484: "Third-party stub packages can use any location for stub storage. Type checkers should search for them using PYTHONPATH."

dmoisset commented 7 years ago

Hi. I've been following this (and related issues like #184, python/mypy#3350, python/mypy#1190, python/mypy#2625, python/mypy#1895, python/typeshed#153), trying to create a summary+clean proposal. My hypothesis is that if we make distribution of stubs easier for library or third party authors, people will start supporting more libraries and more users will be able to benefit from type checking.

I'd like to move this forward ( @JelleZijlstra , can I help writing the PEP?). Something that I think has been blocking us is that we're trying to fit a set of existing ideas ($MYPYPATH, $PYTHONPATH) to solve a new problem and there's some mismatch.

Let me try to summarize the constraints I've identified in the discussions:

Some library authors want to be able to distribute type information.
- Some of them want to use inline annotations, others want to use .pyis
Most library authors don't want/know how to add/care about type information.
- Tools that depend on those (like mypy) may choke if they try to process files in that library so those should be skipped (this discards the idea of pointing $MYPYPATH at site-packages/)
Some third parties want to provide type informations for libraries they don't author
- One specific case of this is people in the mypy project and other people promoting use of python static typing, that do this via the typeshed project so it can be bundled with mypy/pycharm/etc.
- But in the long term there may be people that want to distribute it independently.
The cpython project doesn't want to distribute type information about the stdlib, so that type information also must be distributed separately (currently via the python/typeshed project)
Type information for libraries must be versioned matching the library install
- The stdlib is a "special case" of this, needing to match the python interpreter
To make all this usable and effective, it's desirable to have type information packaging standards/conventions so type checking tools can discover and use the appropriate type information
The code being checked may not use the same interpreter that the typechecker is using, so things like sys.path should not be considered automatically reliable by a checking tool.
The checking tool users may want to override some of this (for example, provide alternate type information that makes more sense for their projects)
- (quoting @gvanrossum) «it would also be good to be able to point a type checker to a particular module in site-packages without implying that site-packages itself should go on the search path»

One important thing that has to be highlighted here is that we have requirements from multiple actors which all are distributing type information in slightly different ways:

The person running the checking tool (and probably authoring a project)
The library author (which also can provide it in two different ways)
Third party type info contributors
The authors of the checking tool (providing typeshed)

Whatever proposal we make also must be able to answer the following questions:

Where should 3rd parties write type information?
How is that distributed?
Where are those files stored? (so the checker can find them)
Where do library author add stubs if they desire?
How is that distributed?
Where are those files stored? (so the checker can find them)
How does a checker know if using py or pyi files for a library, or ignore the py files, or use the files intypeshed? 7.1 if the library author specifies that, how does they do it and how the type checker knows?
How can library author prevent distribution of type info when users are conscious about space?
What overrides are available for the user running the checker?
In which situation typeshed has priority above other things
How does the type information author specifies which version of python is required? is this necessary?
How to handle annotated libraries that start generating static type errors after changes on the checker?

Based on this, let me make this proposal (which is mostly a rehash of things that other people have said but has some new bits)

The proposal

Let's say that there's a library called flyingcircus (I'll call this the "python package", the directory with an __init__.py and files in it), in a PyPI package called flying-circus. The UX for different actors should be IMO:

The ideal UX for library authors (stub based)

If the author of flyingcircus wants to include "official" stubs they should:

Add .pyi files into the flyingcircus source tree
Upload a new version normally to PyPI

This assumes that the library is using the standard python package ecosystem

The ideal UX for library authors (source based)

If the author of flyingcircus wants to include inline annotations (and make those the default), they should:

Add annotations to the .py files
Flag the package as "typing ready" but setting a single value in the package metadata (some argument to setup() in setup.py, or other simple, single change to the package)
Upload a new version normally to PyPI

This assumes that the library is using the standard python package ecosystem

The ideal UX for third party stub authors

Write stubs as .pyi files, mirroring the file structure of the original library
package as a PyPI package with a name different to the library (for example flying-circus-typing)
- Use version numbers matching the original
- Use the flyingcircus name somewhere in the package (to let the toolchain know that these pyi files correspond to flyingcircus)
upload to PyPI

The naming step could be omitted and made by convention, but having a convention here would not allow having more than one third party package which is undesirable.

There are some things that are still impractical here but I'll open them as different tickets because they can be solved separately (related to how to make maintenance of this pyi libraries easy)

The ideal UX for users

If their libraries include stubs, or if they are included in typeshed, installing the library should "just work"
If not, or if they prefer a third party type information, installing the package with type information (and adequate version) should "just work"
If they want to customize type information they should be able to do it by command line / $MYPYPATH / config file
User may need to provide information about the python installation (otherwise the checker can not know which python environment to use) if a reasonable default does not work

The ideal UX for checker authors

If checker authors want to provide stubs for a library, they can commit it to typeshed in the way that's being done now

This does not cover versioning, I think that if this is required they can fall back to the "Third party stub author" scenario

How to get there from here

To make the above possible, the following changes should be implemented:

Changes in checker lookup order

Currently tools define a lookup order, for example https://mypy.readthedocs.io/en/latest/command_line.html#how-imports-are-found This is not standardized.

My proposal is that the lookup order should be:

User preferences. This goes first to allow user override. This can be done through the usual combination of command line + environment variable + config file (in mypy, $MYPYPATH covers this role).
Third party type info. This should be found following the sys.path (of the checked system, not the one of the checker), and adding __pyi__ after the base package name. For example, import flyingcircus should lookup for an __init__.pyi file at os.path.join(p, 'flyingcircus', '__pyi__') for each p in sys.path. And import flyingcircus.gumbys should lookup for gumbys/__init__.pyi or gumbys.pyi at at os.path.join(p, 'flyingcircus', '__pyi__') for each p in sys.path. This item is high on the list because these files will only be present if the end user installed a third party type information package, which shows intent of using it with preference to other information (even if it's by the library authors). If the __pyi__ directory is found but the imported module isn't, there's no further lookup (to avoid mixing up stubs from different sources)
Library type info. This should be found following the sys.path (of the checked system), and looking for pyi files only (not py files). This goes before typeshed because the library author may want to improve on older/less maintained stubs bundled with typeshed (see for example https://github.com/python/mypy/issues/1190#issuecomment-190508891).
Typeshed. The main role for this is stdlib, and stubs for libraries that are important but no one else wants to write. This is low priority because I'm assuming that typeshed doesn't want to be in the long term a repository with "everything" (which does not scale as a solution); and also because the level of bundling (many different libs all together) removes control from the end user.
Library source. This is like 3 but .py files. This is only looked into when the library explicitly tagged support for source checking. This can be done with some marking in the top of the top level __init__.py (for example a # type: check-source comment)

Steps 2,3,5 need the location of python environment used by execution. This can be provided by base path (and then run the interpreter there to get the sys.path) or directly the interpreter path. The interpreter path could be used also as a way to provide the python version to use (by running it with -V)

One side effect of this separation is that there's a more clear separation between 'code being checked' (step 1) and 'libraries to support that code' (steps 2-5), which may help with providing sane error reporting options in tools and handle better the "my checker is stricter now but I don't care about these new error messages in library code" scenario.

Changes in packaging tools

Even if a working result can be achieved with no changes to current packaging tools, the following (minor) changes would streamline the process:

Modify setuptools to automatically include .pyi files in packages
- if this is found undesirable in the future it could be made optional, but I wouldn't bother unless someone complains.
make setuptools.find_package() find directories only with __init__.pyi files (to simplify stub only packages)

Resulting UX for library authors (stub based)

Add .pyi files into the flyingcircus source tree. Each .pyi in the same directory as the corresponding .py file.
Upload a new version normally to PyPI

Resulting UX for library authors (source based)

Add annotations inside the .py files
Add # type: check-source to flyingcircus/__init__.py
Upload a new version normally to PyPI

Resulting UX for third party stub authors

Create a flyingcircus/__pyi__ directory.
Write stubs as .pyi files there, mirroring the file structure of the original library
package as a PyPI package with an arbitrary name different to the library (for example flying-circus-typing)
- Use version numbers matching the original
- find_package() should find the correct module; without setuptools support the list should be specified (setup(..., packages=['flyingcircus.__pyi__', 'flyingcircus.__pyi__.gumbys'], ...))
- pyi files should be included; without setuptools support you should add them (setup(..., package_data={'': '*.pyi'}, ...))
upload to PyPI

Changes in documentation / PEP-484

This proposal could go into a new PEP
PEP-484 should drop the section about placing stubs and point to this new PEP
the new PEP or some side documents should describe the procedures to be follow for each actor

Changes in tools

Tools like mypy/pycharm would need to reimplement the lookup code to support is. The lookup algorithm is more complicated than what can be achieved with a simple list like $MYPYPATH; and it's better to leave $MYPYPATH just for user overrides.

Those should also allow specifying the interpreter of the target code to get relevant path search information. See for example https://github.com/python/typing/issues/84#issuecomment-304065757

Room for improvement

I see a few things here can be improved, I'm open to ideas:

PyPI packages that have a single module instead of a python package may need some special handling (what I've described assumed that the library is a python package)
Making third party stub authors to put everything inside a flyingcircus/__pyi__ directory looks a bit weird, although it's realatively easy to do
There may be a bit of confusion of when to add a __pyi__ and when to just add .pyi files along your source; mixing them up may result in packages with overlapping filenames (which current python packaging tools allow happily)
There's no way to specify supported python versions in a stub library. However, I think that for most libraries this is not needed, and when it is needed it should be possible to use the usual if sys.version >= ...

ethanhs commented 7 years ago

Thank you @dmoisset for this very thorough dissection of the discussion and an excellent proposal! I have a few questions this raises, and a few thoughts, otherwise I agree with basically everything you say/propose.

What happens if there are multiple stub-only packages? If they have different versions we may be able to prefer one over the other. If they have the same version, should checkers fail? Should this be left up to checkers/users?
As for the declaration of whether source supports checking, I think project metadata is the best way to go. It keeps the check before trying to parse files which may not safe, and handles the case of a package containing only a single module.
I have an alternative suggestion for the location of stubs in stub-only packages: my-flyingcircus-override/flyingcircus/, so that we can internally treat it like any other directory and check its contents. This means we probably don't have to special case the __pyi__ directory name.
"There's no way to specify supported python versions in a stub library". PEP 484 specifies that simple checks such as sys.version etc. can be used. I think this is sufficient, but if we want package scope (which makes a lot of sense to me) we may want this in the metadata too?
Do we want typed packages to depend on setuptools to install/build? I see no reason why not, but I thought I'd ask.

Also, this is somewhat tangentially related, but I am beginning to be of the belief that typeshed should eventually (long down the road) be split into stub packages, as it would make many things easier. It could even prove a good experiment of the UX for stub package maintainers to try with the most popular parts of typeshed.

Thanks again for taking the time for writing this out! I found it very helpful.

gvanrossum commented 7 years ago

Daniel, I love the thoroughness of your summary of the issues and your clear proposal! I think if we implement it we will be set for a very long time. Let me see if I can summarize the actual proposal, in order to suggest some minor tweaks. The default search order in your proposal is:

User code (the app or library you're checking), typically from the command line.
Third party type info (pyi or py files) explicitly installed.
Stubs (pyi files) packaged with third party packages.
Typeshed (for stdlib as well as third party packages).
Non-stub third party packages with inline annotations if marked as such.

Another piece of the proposal is that whenever sys.path or site-packages is used, it should be for the target interpreter (e.g. for mypy, indicated via --python-version=N.M).

My commentary:

Since invoking this interpreter is relatively expensive we should just ask it to print the contents of sys.path (and whatever else we need).
I'm honestly not sure yet how mypy should treat MYPYPATH -- but it's reasonable to insert it between (1) and (2).
We need a convention for (2); I'm not so keen on your flyingcircus/__pyi__ proposal, especially since (due to namespace packages, PEP 420) its presence on sys.path would confuse Python's own search. I like Ethan's proposal here (search for my-flyingcircus-override/flyingcicrus, assuming the pattern is exactly my-FOO-override/FOO).
I'm not sure why typeshed (4) should override the package itself (5), since that would mean that packages can't improve on poorly done typeshed stubs without adopting stubs themselves.
There's a use case where a package contains a mix of stubs and inline annotations (IIRC Tornado asked for this) which would be awkward.
We may end up having to search a lot of directories, but we can solve that with caching (a directory listing can make many stat calls unnecessary).
We need a way to mark a package as "fit for (5)". I agree with Ethan that it's better to do this in metadata than in __init__.py (assuming setuptools will copy arbitrary metadata, which we'll have to test).
I think there was something else but I've forgotten, if I remember I'll post here.

ethanhs commented 7 years ago

assuming the pattern is exactly my-FOO-override/FOO

Or perhaps namespace-FOO-overide/FOO, where namespace can be any string to provide uniqueness. I agree with Daniel we don't want to limit to one alternative package (it would cause competition issues for grabbing names on pypi, and other issues). But this can be iterated on later. :)

I think typeshed should be last in resolution order, with the option of bumping it up in the order at the discretion of the user/checker. This means that type information in a working package (which should be correct) will come before typeshed, but if typeshed has better info, the checker or user can override this behavior.

As for a mix of stubs and inline annotations, I believe this would be a good application for https://github.com/ambv/retype. Deciding how to merge mixed codebases could add significant complexity. Regardless with how this is decided it would need to be very well documented. When you say Tornado wanted this, do you mean they have a mix currently, or don't want types in their entire code base, or is there a reason they cannot go one way or the other?

gvanrossum commented 7 years ago

Or perhaps namespace-FOO-overide/FOO, where namespace can be any string to provide uniqueness. I agree with Daniel we don't want to limit to one alternative package (it would cause competition issues for grabbing names on pypi, and other issues). But this can be iterated on later. :)

I worry this part is being over-engineered -- are we really expecting multiple groups to issue competing types-only packages for the same 3rd party package? I think we're better off with an informal convention like the FOO-dev packages in Ubuntu.

I think typeshed should be last in resolution order, with the option of bumping it up in the order at the discretion of the user/checker. This means that type information in a working package (which should be correct) will come before typeshed, but if typeshed has better info, the checker or user can override this behavior.

That last bit also sounds like a bit of over-engineering to me.

As for a mix of stubs and inline annotations, I believe this would be a good application for https://github.com/ambv/retype.

As a package author I'd be disinclined to allow such automation to modify my code without very careful review.

Deciding how to merge mixed codebases could add significant complexity. Regardless with how this is decided it would need to be very well documented. When you say Tornado wanted this, do you mean they have a mix currently, or don't want types in their entire code base, or is there a reason they cannot go one way or the other?

IIRC they generally wanted to go with inline annotations (good for them!) but in some cases would have to work around type system limitations that would get pretty ugly inline, so they proposed to use stubs for those few cases. (I don't recall details but I know I've had that inclination in a few corners of our internal codebases.)

JelleZijlstra commented 7 years ago

@dmoisset Thanks for your detailed summary and proposal! I don't think I'll have time soon to write up anything more detailed, so don't worry about pre-empting me if you want to write up a PEP.

One area you may need to also consider is versioning for library stubs. If I release a package providing stubs for flyingcircus, I would want a way to cover API differences among different versions of the library, and I'm not sure there's a clear way to do that under your proposal (see also python/typeshed#153).

asvetlov commented 7 years ago

I don't think versioning for stub libraries is an issue: user should install them by pip install, he could pin stub version together with base library, e.g. in requirement files etc.

Though we could recommend supporting the same version number by stub library authors, e.g. for Django==1.11 django-stub should be 1.11 too.

JelleZijlstra commented 7 years ago

But what if the author django-stub discovers that they made a mistake in their stubs for Django 1.11? Perhaps they could release django-stub 1.11.1, but then that would mean they'd have to maintain separate, mostly identical copies of the stubs for each Django version.

I think we'll have to support something like this straw man: stubs can do if __version__ >= (1, 11), where __version__ is a magical constant that the type checker evaluates to the version of the package that is being used. There's many details there that I haven't thought much about: what about namespace packages? how does the type checker know what library version you are using?

gvanrossum commented 7 years ago

Yeah, I think this proposal supports library versioning well enough.

[Clarification: This was written before Jelle's comment above, in response to @asvetlov's "I don't think versioning for stub libraries is an issue".]

gvanrossum commented 7 years ago

But what if the author django-stub discovers that they made a mistake in their stubs for Django 1.11? Perhaps they could release django-stub 1.11.1, but then that would mean they'd have to maintain separate, mostly identical copies of the stubs for each Django version.

I think we'll have to support something like this straw man: stubs can do if __version__ >= (1, 11), where __version__ is a magical constant that the type checker evaluates to the version of the package that is being used. There's many details there that I haven't thought much about: what about namespace packages? how does the type checker know what library version you are using?

This feels like worrying too much. We've never even tried any of this proposal, and we don't know if this scenario will occur frequently enough to worry about. But if we specify something to handle this now we'll never be able to remove that feature, no matter how unpopular (because it'll always break someone's workflow).

Also, I really don't like having to support library version comparisons in the checker -- unlike Python versions we don't have control over the ordering of libraries (they don't all use semver). The maintenance problem can be solved using branches. We're better off allowing some freedom in the naming of stub packages -- maybe the package name will end up including the django version (e.g. django-1.1-stubs) so the package version can be whatever the stub package author wants.

ethanhs commented 7 years ago

I've started writing up a PEP. Hopefully I can get a draft out tomorrow or the next day.

Things to think about which I leave open for discussion (because they should be discussed more):

How should packages indicate they support typing? (My personal favorite is a new trove classifier, but other options exist)
How should mixed stub/inline packages be dealt with? If we have stubs in a package, should we parse the Python files to check if there is type information in the files? That could be slow for large codebases. Should we have 3 states of metadata? Stubs, inline, mixed? That has its own drawbacks.
What to do with the existing PEP 484 text on the matter? Some say we should delete it and replace it with the new PEP. Im not sure that is the best thing to do, but it sounds cleaner.
Other issues with current designs.

gvanrossum commented 7 years ago

I'm not sure I like Trove classifiers that much. They're almost free-form but here we need something machine parseable. Maybe there's something in PEP 459? Or maybe we can just add something else to the egg-info (but I don't know how any of that stuff works, really).
For packages that declare they support typing, we should use the standard search approach, which is .pyi first then .py, ignoring the .py if the .pyi is found.

ethanhs commented 7 years ago

Okay, I have a rough draft here: https://github.com/ethanhs/peps/blob/typedist/pep-0561.rst

I also added a PR https://github.com/python/peps/pull/415 if people prefer the Github review UI, but also feel free to leave comments in this issue.

I plan on making a POC implementation of the distutils extension when I have time either later today or tomorrow.

A couple points Im not sure about:

should the keyword be a boolean about whether the package is stubbed or not? Then stubbed == True inline == False and type unsafe == None.

Mixed inline/stubbed packages - I originally wrote a version with an additional option for this, but it seemed to make things more complicated than needed.

Thanks!

ethanhs commented 7 years ago

The PEP has been posted to Python-dev and the latest live version can be found here: https://www.python.org/dev/peps/pep-0561/

ethanhs commented 6 years ago

I believe PEP 561 resolves this issue, now that it is accepted.

gvanrossum commented 6 years ago

Yes!

ilevkivskyi commented 6 years ago

I believe PEP 561 resolves this issue, now that it is accepted.

It was a long way. Thanks @ethanhs for all the work on this!

python / typing

Third-party stubs: recommending a default path for installing stub files, overriding stubs #84

The proposal

The ideal UX for library authors (stub based)

The ideal UX for library authors (source based)

The ideal UX for third party stub authors

The ideal UX for users

The ideal UX for checker authors

How to get there from here

Changes in checker lookup order

Changes in packaging tools

Resulting UX for library authors (stub based)

Resulting UX for library authors (source based)

Resulting UX for third party stub authors

Changes in documentation / PEP-484

Changes in tools

Room for improvement