python / typing

Python static typing home. Hosts the documentation and a user help forum.
https://typing.readthedocs.io/
Other
1.58k stars 235 forks source link

Question about stub symbols that begin with underscore #1435

Open erictraut opened 3 years ago

erictraut commented 3 years ago

My understanding is that symbols defined at the module scope of a stub file and begin with an underscore are meant to be private to the stub — that they don't represent symbols exported from the module. For example, the symbol _SupportsPow2 in builtins.pyi clearly shouldn't be part of the module's namespace. Likewise, _T defined in many stdlib stubs shouldn't be considered an externally-visible symbol. Presumably, a type checker should report an error if someone attempts to import these symbols or reference them from another module. An attempt to reference these symbols at runtime will typically result in an exception.

However, it appears that there are exceptions to this rule. For example, the function _exit defined in os/__init__.pyi is apparently meant to be externally visible.

Is there a way for a type checker to differentiate between these two cases?

In the case of a "py.typed" library (as defined in PEP 561), including the symbol in an __all__ list can be used as a way to indicate that an otherwise-private symbol is meant to be externally visible. Would it make sense to adopt this same mechanism for stub files — that is, allow the author to specify an __all__ list that includes a symbol name that begins with an underscore?

JelleZijlstra commented 3 years ago

Type checkers should allow users to use all names defined in the stub, including those prefixed with _. Usage of private (underscore-prefixed) variables from another module would then be caught by lint rules.

Using __all__ isn't possible because __all__ in the stub should reflect the runtime value. os._exit isn't in os.__all__ at runtime, so we shouldn't put it there in the stub.

Allowing usage of private names can lead to some false negatives from type checkers, but our philosophy is generally to prefer false negatives over false positives. We already have numerous private function in the stubs (run git grep 'def _[a-z]'), and I would expect type checkers to allow users to use those type definitions.

srittau commented 3 years ago

In theory, there is the @type_check_only decorator, but that has never gained traction.

erictraut commented 3 years ago

I agree that false positives are preferable to false negatives, but neither of them are desirable.

I can understand how this class of false negative might be acceptable for static type checking, but now that stubs inform completion suggestions in language servers, it becomes more of a problem. I don't think it's a good idea to be suggesting these symbols as completion suggestions.

@srittau, I wasn't aware of @type_check_only. Since that's a decorator, it would presumably work only with classes and functions, not with variables. It's also pretty cumbersome, so I can understand why it didn't catch on. Perhaps we need to come up with some new mechanism if we want to address this shortcoming.

It seems to me that module-level symbols that begin with underscore are rarely intended to be exposed from a type stub. There are only ~50 instances of this in all of the stdlib stubs and <10 instances in all third-party typeshed stubs. Perhaps it would be better to treat all module-level symbols that begin with underscore as "not exported" then provide a way to override this to handle the few exception cases.

Akuli commented 3 years ago

I have a few ideas for this, but I think they are all bad.

Before typeshed was a thing, autocompletion libraries (I think at least jedi) just inspected what was available at runtime. It is a bit overkill for this, but if a langserver imports the module and checks whether the attribute actually exists, that would allow a langserver to distinguish between os._exit and os._T. Of course, this isn't great if you don't want to just import arbitrary third-party modules that could crash the langserver or have some other arbitrary side-effects, or it is difficult to invoke Python from the langserver.

Another alternative would be to not suggest os._exit names, unless the user has already typed os._. This isn't great either, because it is possible to type os.<Tab>, and then type _ and have the editor filter through the list without asking the langserver for a new list of completions.

A last resort could be a hard-coded list of _foo identifiers in stdlib, which works well enough as this mostly occurs within stdlib.

We could consider something like this:

This might not work in some corner cases, but maybe it would work in most cases. At least it would distinguish os._T and os._exit.

erictraut commented 3 years ago

@Akuli, thanks for your thoughts. I'm not enamored with any of these solutions either.

Executing the code isn't an option. That would open up a big security vulnerability if a language server were to execute code without the user first verifying that it's safe to do so. If Jedi is doing this, it should stop. This is a big security vulnerability, and someone with malicious intent will eventually exploit it.

The second solution will result in false negatives — cases where the use of a suggested symbol results in runtime errors. Users expect that language servers won't lead them astray with bad suggestions, so this won't meet expectations.

Your third suggestion is potentially viable. Module-level symbols with declared types (and that including function and class declarations) would be treated as public exports unless they are decorated with @type_check_only. I have a couple of concerns with this proposal. First, it's rather complex to understand and cumbersome to implement. Second, it would create another inconsistency between between stubs and py.typed libraries because I don't think it's reasonable for typed ".py" files to decorate every private class and function with @type_check_only. I'd prefer to find a solution that works consistently for stub files and py.typed ".py" files.

jakebailey commented 3 years ago

If Jedi is doing this, it should stop. This is a big security vulnerability, and someone with malicious intent will eventually exploit it.

IIRC Jedi does for compiled code (optionally, but most consumers enable it), as does PyCharm in some scenarios past just compiled code (I haven't tested in a while).

Akuli commented 3 years ago

Perhaps it would be better to treat all module-level symbols that begin with underscore as "not exported" then provide a way to override this to handle the few exception cases.

I like this idea. Ideally, the "export this" marker would be:

I'm thinking something like a new comment, let's say # type: export, that you would add to names like _foo:

_string_type = str  # type: export

def _exit() -> NoReturn: ...  # type: export

def _this_function_takes_many_arguments(  # type: export
    foo: str,
    bar: str,
    baz: float,
    blah: int,
) -> None: ...

This isn't backwards compatible though. Currently # type: export means that the type of whatever is being defined is given by a symbol named export (similar to # type: str). But it shouldn't be hard to teach type checkers to ignore these comments. They are supposed to ignore it, as it is a hint just for autocompletions.

Akuli commented 2 years ago

One more idea that might be good or bad, don't know yet:

Given that we already have from somemodule import foo as foo, we might as well decide that _foo = _foo exports. It looks weird and isn't as explicit as we would hope, but it is backwards-compatible.

def _exit() -> NoReturn: ...

_exit = _exit

We could recommend adding a comment to every _foo = _foo exporting. This would be similar to # undocumented, ignored by type checkers but easy to grep.

def _exit() -> NoReturn: ...

_exit = _exit  # export
srittau commented 2 years ago

I'd like to see something like this formalized. But as it concerns stubs and type checkers in general, we should probably move this discussion to typing-sig.

AlexWaygood commented 1 year ago

I'm transferring this issue to the python/typing repo, as it's a general discussion that affects authors and users of all type checkers, and there's nothing immediately actionable here for typeshed