microsoft / pyright

Static Type Checker for Python
Other
13.04k stars 1.39k forks source link

Don't simplify union with `Literal` #8392

Closed nineteendo closed 1 month ago

nineteendo commented 1 month ago

Example:

from typing import IO, Any, Literal, overload

def open_file(filename: str, mode: Literal["r", "w"] | str) -> IO[Any]:
    return open(filename, mode)

@overload
def open_file2(filename: str, mode: Literal["r", "w"]) -> IO[str]: ...
@overload
def open_file2(filename: str, mode: str) -> IO[Any]: ...

def open_file2(filename: str, mode: str) -> IO[Any]:
    return open(filename, mode)

I get no useful information for open_file(), unlike the more verbose open_file2() (the return type isn't always dependent on the input parameters):

image image

erictraut commented 1 month ago

I'm conflicted about whether to make a change here.

From a type perspective, the type Literal["r", "w"] | str is the same as the type str. They are completely equivalent from the standpoint of type theory. Pyright elides the unnecessary part of the type and simplifies it to str because doing so is very cheap at the time the union is being constructed, and it reduces additional work in the type checker in downstream tasks. It also simplifies the locally-narrowed types that appear in hover text, which reduces cognitive load for users of language servers.

The downside, however, is that pyright is throwing away some information that (in this particular case) is useful for completion suggestions.

There's also an argument to be made that pyright is being inconsistent in that it elides certain unnecessary subtypes from a union but not others. For example, the type object | str is equivalent to object, but pyright doesn't make that simplification. That's because such a simplification would be very expensive at union creation time, so this would have a measurable impact on type analysis speeds. So this inconsistency is purely pragmatic.

Here's a PR where I attempt to remove this eliding: https://github.com/microsoft/pyright/pull/8395. As you can see, it has a pretty significant impact on type checker behavior.

I'm going to experiment with a more surgical fix as well before I make a final decision on this issue.

erictraut commented 1 month ago

Here's a more surgical approach: https://github.com/microsoft/pyright/pull/8396. This retains pyright's previous behavior for most operations involving unions, but it refrains from eliding redundant literals for a few specific cases — most notably, when building a union type from a type expression that explicitly includes literals and non-literals like Literal["r", "w"] | str.

I think this is a better approach. It nicely balances the type checker concerns with those of the language server features. This will be included in the next release.

nineteendo commented 1 month ago

Thank you, now I should be able to write this too:

import json
from typing import Any, Container, Literal

def dumps(obj: Any, *, allow: Container[Literal["nan"] | str] = ()) -> str:
    return json.dumps(obj, allow_nan="nan" in allow)

Could you check whether this fixes https://github.com/microsoft/pylance-release/issues/6123? I think there's some internal eliding going on there.

erictraut commented 1 month ago

It doesn't address the other issue. This surgical fix attempts to balance the needs of type checking functionality with that of common completion suggestion use cases. Addressing the other issue would require more extensive exceptions and would not strike the right balance, IMO.

erictraut commented 1 month ago

This is addressed in pyright 1.1.372.

nineteendo commented 1 month ago

Hmm, it doesn't work with containers:

import json
from typing import Any, Container, Literal

def dumps(obj: Any, *, allow: Container[Literal["nan"] | str] = ()) -> str:
    return json.dumps(obj, allow_nan="nan" in allow)

dumps(float("nan"), allow=[""])
Screenshot 2024-07-24 at 17 24 32

Is this logic somehow duplicated?