microsoft / pyright

Static Type Checker for Python
Other
13.23k stars 1.42k forks source link

`LiteralString` type is not enforced in invariant context #5999

Closed erictraut closed 1 year ago

erictraut commented 1 year ago

The special form LiteralString is a subtype of str, and all literal string values are subtypes of LiteralString. When LiteralString is used in an invariant context (e.g. list[LiteralString]), it should be considered incompatible with either str or a specific literal str.

Currently, the following code generates no errors, but it should produce two errors.

def func(a: list[LiteralString], b: list[Literal["a"]]):
    # This should generate an error because of invariance rules.
    v1: list[str] = a

    # This should generate an error because of invariance rules.
    v2: list[LiteralString] = b
mikeshardmind commented 1 year ago

~~I'm not completely convinced that either of these cases are bugs, but inverting the LHS and RHS, they would be.

The problem in this case is that these are essentially refinement types. This isn't a normal subtyping relationship. All LiteralStrings are perfect drop in replacements for strings at runtime (They are quite literally the same type), but the reverse is not true (consider safe query building with the additional guarantees of LiteralString)

The relationship here is not the type, but how the type was constructed or restricted (ie. refinement types)~~

erictraut commented 1 year ago

The term "refinement type" is new to me. I don't think it has been used in any Python type documentation or PEPs previously (including PEP 586 which introduced Literal and PEP 675 which introduced LiteralString). Are you suggesting that we should introduce a new concept into the Python type system that doesn't exist today — and introduce exceptions whereby these types don't follow the same subtyping rules as other types?

Logically, it makes sense to me that Literal[False] and Literal[True] are proper subtypes of bool. Likewise, it makes sense that Literal["a"] is a proper subtype of LiteralString, which represents the union of all possible literal strings. And LiteralString is a proper subtype of str. That seems consistent with type theory rules. Treating literal types differently from other types in the type system would introduce more special cases and complexity — which I think we can agree isn't desirable.

I'm going to stick with my current understanding of how literal types work in the Python type system. If some future formal type specification clarifies that these should work differently, then we can make the appropriate changes at that time. Such a change would impact more than just LiteralString, and it would require changes to all Python type checkers. In the meantime, I'm going to maintain consistency — both internally within pyright and across type checkers — by sticking to my current understanding.

mikeshardmind commented 1 year ago

Are you suggesting that we should introduce a new concept into the Python type system that doesn't exist today — and introduce exceptions whereby these types don't follow the same subtyping rules as other types?

I'm saying that's what they are, even if they aren't formally described that way currently.

It's perfectly sound to use a LiteralString anywhere a string is expected even invariantly, so long as the type specification is str and the inference does not become LiteralString, but the reverse is not true. (edit, there's more, but it doesn't apply to python) This is because LiteralString is still exactly a str at runtime, it's just also known to have been constructed as a literal. We can't rely on it being a literal after assignment to a string as it could be reassigned again, but this is a different property than the type of the object. The reverse is not true because the reverse has specified a reliance on it being a Literal.

I'm going to stick with my current understanding of how literal types work in the Python type system. If some future formal type specification clarifies that these should work differently, then we can make the appropriate changes at that time. Such a change would impact more than just LiteralString, and it would require changes to all Python type checkers. In the meantime, I'm going to maintain consistency — both internally within pyright and across type checkers — by sticking to my current understanding.

I understand your reasoning for this, and I think it's yet another reason we need formalization based on theory, not just the collection of knowledge we currently have accepted in piecemeal.

mikeshardmind commented 1 year ago

https://en.wikipedia.org/wiki/Refinement_type is somewhat useful here for explaining, the definition here is accurate.

Namely, a type specification: Literal["x"] is the set of all string literals that are "x"

It may be more intuitive to consider a numeric case. If we had Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15], any value of this is clearly exactly an int at runtime, and safe as a complete substitute for one, but these are also known to be safe to pack into a specific bitsize, whereas this isn't true of all ints

erictraut commented 1 year ago

Hmm, I'm not sure I follow. Wouldn't that mean the following would type check without error? Isn't this a problem?

from typing import Literal

def append_b(x: list[str]):
    x.append("b")

def func1(x: list[Literal["a"]]):
    append_b(x)
    assert all([v == "a" for v in x])

func1(["a"])
mikeshardmind commented 1 year ago

That's different from your initial example and falls under what I said about swapping the LHS and RHS (it's not), refinement types have a bit more nuance in where they can be soundly substituted and that's even something I mentioned in one of these recent discussions about how intersections have gotten complex due to lack of clarity in places in the type system.

edit: nope, initial example isn't fine either, I'm sorry.

Wtih basic classes as types, we have nominal subtyping With protocols, we have structural subtyping With Literals, we have refinement types

and the goal is to ensure we reconcile the behavior of all of them.


I think it's "fine" but not ideal to treat this as subtyping temporarily until this can be cleared up at a specification level. This is more restrictive than it needs to be (edit: actually, might not be, need to consider the full impacts), your most recent example should fail, but both of your initial examples should be fine.

I got a slight bit wrong here, and I should go ahead and clarify. It's safe and sound if the original value can't be modified anymore. This is still slightly different from it being unsafe invariance.

For example:

def get_safe_to_pack() -> list[Literal[0,1,2,3]]:
    ...  # what happens if this list to be returned is also placed somewhere it can be modified?

x: list[int] = get_safe_to_pack()

The problem here is the type system itself doesn't have a way to handle this, and unlike other languages, there's no guarantee other people can't change it. Therefore is fine if the refinement type is part immutable structure, but not otherwise. Apologies for the mixup on this one.

However, what about a frozen dataclass? It's perfectly fine to put Literal[1] in place of a field in one of those expecting an int.

DiscordLiz commented 1 year ago

@mikeshardmind Agreed on specification and more formal language helping, but in this case, it's 100% fine to just treat a refinement type as a subtype. If you step back for a moment, you'll notice the handling the variance as you would for mutability already handles where it is and isn't safe. This matches with the set-theoretic definitions I hope are adopted long term.

@erictraut I'd like to apologize about prior interactions while I'm here. I let my frustration with the status quo inappropriately effect how I was participating.

erictraut commented 1 year ago

This is addressed in pyright 1.1.329, which I just published. It will also be included in a future release of pylance.

zwade commented 1 year ago

Hi @erictraut, sorry to comment on an already closed issue, but I was wondering if there was an followup to @mikeshardmind's comment above

However, what about a frozen dataclass? It's perfectly fine to put Literal[1] in place of a field in one of those expecting an int.

I have this particular use case and am wondering how to proceed. Is there a way to mark a property as read-only so that the subtyping context is treated as covariant instead of invariant?

Thanks!

erictraut commented 1 year ago

@zwade, the latest version of pyright (1.1.329) treats attributes within frozen dataclasses as immutable. See this issue and this issue. If your use case isn't addressed by this, please provide more details along with a self-contained minimal code sample.