Type inference of attrs assigned in __init__ in third-party py.typed packages

Is your feature request related to a problem? Please describe. This issue based on the discussion in the #1832 issue.

Instance attributes in many libraries, that I checked, aren't explicitly annotated in class bodies. Real example in aiohttp. Pyright currently does not analyze their types and considers them as `unknown'.

My feeling is that most libraries have this problem. All libraries that I checked have this problem: marshmallow, starlette, tornado, pydantic, asyncpg (in MagicStack/asyncpg#577), click, aioredis, aiopg, prompt-toolkit...

Example:

from aiohttp import web
from aiohttp.abc import AbstractAccessLogger

class Logger(AbstractAccessLogger):
    def log(self, request: web.BaseRequest, response: web.StreamResponse, time: float) -> None:
        reveal_type(self.logger)  # info: Type of "self.logger" is "Unknown"

Mypy in this example infers type of logger as logging.Logger.

This behavior can complicate the use of pyright in an existing ecosystem of libraries and can hide some of the errors with reportUnknownMemberType turned off.

Describe the solution you'd like I think Pyright should infer field types for py.typed libraries like mypy does. This will reduce the difference between mypy and pyright and make pyright easier to use in an existing ecosystem.

This problem isn't specific to fields assigned in __init__ methods. We've found that many libraries that are marked "py.typed" (claiming to be "typed") have incomplete type information. That includes missing annotations for function parameters, function return types, class variables, instance variables, and globals. This situation isn't surprising given that PEP 561 defined py.typed but didn't provide any guidance about what it means for a library to provide types.

We discussed this situation a while back in the typing-sig (the forum for maintainers of the various Python type checkers and type stubs). Out of this discussion came this draft guidance for library authors. Pyright implements this proposal, which attempts to balance the interests of library authors and library consumers with the goal of improving the entire Python ecosystem over time.

As discussed in this guidance, it's highly undesirable to rely on type inference for a library's public interface contract. There are no standards around type inference, and type checkers across the Python community infer types using slightly different techniques and rules. Furthermore, type inference can be quite expensive, which means that the experience suffers for users when tools like pylance/pyright use type information for interactive feedback during editing. When it comes to describing the public interface contract for a library, it's important that types are explicitly declared, not inferred.

Until recently, there was no way for a library maintainer to discover whether their "py.typed" library was properly and completely typed. About six months ago, I added a "--verifytypes" option to the command-line version of pyright that analyzes a "py.typed" library and reports missing type information. The good news is that the "type completeness score" (the percentage of public symbols within a library that are properly annotated) has steadily risen in many popular libraries over the past six months. Library authors are seeing the benefits of providing type annotations, and library consumers are increasingly demanding that this information be provided by libraries. So we're moving in the right direction!

I've been periodically running "--verifytypes" on popular libraries as new versions are released. The aiohttp library, for example, had a type completeness score of 78.6% back in version 3.6.2, and it now has a type completeness score of 84.2% in version 3.7.4. Its public interface contract includes 2301 symbols (functions, module-level variables, classes, methods, class variables, instance variables, type aliases, etc.), and 1940 of those are properly type annotated. It should be relatively easy for the remaining 361 symbols (many of which are instance variables) to be annotated as well. This is a great example of a library that could get to 100% completeness with a relatively small time investment.

Here are the type completeness scores for the latest versions of the libraries you've listed above:

aiohttp: 84.2%
marshmallow: 26.9%
starlette: 60.6%
tornado: 36.2%
pydantic: 69.6%
asyncpg: not py.typed
click: not py.typed - stubs included in typeshed
aioredis: not py.typed
aiopg: not py.typed
prompt-toolkit: 72.3%

I understand that the current situation isn't ideal. We're trying to provide the guidance, tooling, and incentives to push the Python ecosystem to a better place in the long run. We need to balance short-term and long-term goals, and if we get that balance wrong, we will continue to be trapped in a less-than-ideal state for a long time.

Here's a spectrum of options for us to consider (not all of them mutually exclusive).

We could put more pressure on library authors to make type improvements faster. This could involve some combination of the following:
- Formally ratify the guidance (in the form of a PEP).
- Get all type checkers (including mypy) to conform to the guidance, so unannotated types within a "py.typed" library are not inferred but instead are treated as "Any".
- Use public forums to promote the benefits of type completeness. Provide public praise for library maintainers who make this investment and incentives for those who are otherwise reluctant to do so.
- Organize and coordinate efforts within the Python community to contribute PRs to the post popular open-sourced libraries so we can get them all to 100% type completeness.
- Continue to invest in more tooling so it is easier to fill in missing type annotations within library code.
- Make it easier for library authors to incorporate type validation in their CI pipelines so they can maintain 100% type completeness.
We could amend the guidance to provide special-case rules for some straightforward type inference of instance variables. For example, maybe we say that an instance variable's type can be inferred in the case that it is assigned a value within an __init__ method using a simple assignment expression (self.<attr> = <rhs>) where <rhs> is a simple symbol name corresponding to an input parameter whose type is annotated. While such an exception sounds pragmatic, it leaves a lot open to interpretation, and there will be confusion about when an annotation is needed and when it's not. For example, what if the assignment occurs in an if block or a try block? What if there are multiple assignments in the __init__ method? Type annotation rules are already complex, and adding more special cases is not ideal. It would be much better for library authors to be explicit in providing types for instance variables that they consider to be part of their public interface contract. To quote PEP 20: "Explicit is better than implicit."
Pyright could provide some "escape valve" for libraries that are marked "py.typed" but do not provide complete type information. This could take the form of a global setting that means "assume that all 'py.typed' libraries have incomplete type information and allow type inference for them", or it could provide finer-grained control so specific libraries could be listed. This will help pyright users in the short term, but it will reduce the incentive for library maintainers to make the improvements we need them to make. So it could slow down overall progress.

Hopefully that gives you a sense for the complexities and tradeoffs involved here.

Let me add one more reason why pyright currently assumes that types are unknown in a py.typed library rather than using type inference.

Type inference involves various heuristics that can easily fail in the general case. Take, for example, x = [3]. Should we infer list[int], list[float], list[Optional[int]], list[Any]? When these heuristics fail, false positive errors are often the result. We generally consider a false positive worse than a false negative. (By "false negative", I mean a situation where a potential type violation goes silently unreported.) False positives require hacky work-arounds and do more to erode trust in static type checking than false negatives, so when we make tradeoffs in pyright between false positives and false negatives, we tend to err on the side that minimizes false positives. I assume from your statement above that you're primarily concerned about false negatives ("can hide some of the errors").

Thank you for your comprehensive explanation!

it is funny that some of the libraries I have listed contain a py.typed file, but do not mention it in MANIFEST.IN 🙃.

I assume from your statement above that you're primarily concerned about false negatives ("can hide some of the errors").

Yes, that's exactly right, so I try to use strict mode wherever I can.

Formally ratify the guidance (in the form of a PEP).

I believe this is the best way to reach out to the largest number of library authors. Perhaps you can start some discussion about it (maybe in python-sig)?

Continue to invest in more tooling so it is easier to fill in missing type annotations within library code.

It would be really cool if mypy started automatically checking type completeness for py.typed libraries too. But if it were a separate cli command, the adoption rate would probably be too low.

Perhaps pylance can show places where you need to explicitly specify types if there is a py.typed file? Perhaps the same feature request can be sent to Jetbrains for Pycharm support. But I think that a PEP is needed for such decisions.

Use public forums to promote the benefits of type completeness. Provide public praise for library maintainers who make this investment and incentives for those who are otherwise reluctant to do so.

It might also be cool to make a badge on GitHub with types completeness percent, similar to code coverage percent.

Also, it is possible to provide some functionality to automatically generate documentation on the entire public API for Sphinx users (partially using autodoc for example), еhis may be a good incentive for library authors.

Pyright could provide some "escape valve" for libraries that are marked "py.typed" but do not provide complete type information. This could take the form of a global setting that means "assume that all 'py.typed' libraries have incomplete type information and allow type inference for them", or it could provide finer-grained control so specific libraries could be listed.

Sounds like a philosophical question, as a standard user of type checkers, I could definitely use such a feature, especially in the case of "finer-grained control so specific libraries could be listed". But of course you better understand all tradeoffs of such a decision.

After discussing this internally, we've decided that we're going to continue on our current path and continue to encourage library authors to include complete type information along with their libraries. We have a number of initiatives and investments in the works to further this goal.

At least for the time being, we're not going to provide means to ignore "py.typed" on a per-library basis. We may revisit that decision in the future depending on feedback we receive.

microsoft / pyright

Type inference of attrs assigned in init in third-party py.typed packages #1846

microsoft / pyright

Type inference of attrs assigned in __init__ in third-party py.typed packages #1846

Type inference of attrs assigned in init in third-party py.typed packages #1846