python / mypy

Optional static typing for Python
https://www.mypy-lang.org/
Other
18.28k stars 2.8k forks source link

Problem with `str` being treated as `Sequence[str]` #5090

Closed pawelswiecki closed 1 year ago

pawelswiecki commented 6 years ago

Are you reporting a bug, or opening a feature request?

I think it's a bug or very unexpected behavior at least. Related issues I found: #1965, #4334.

Code

d = {'k1': 'qwe', 'k2': []}
reveal_type(d)
value: str = d['k1']

Actual Output

$ mypy file.py
file.py:2: error: Revealed type is 'builtins.dict[builtins.str*, typing.Sequence*[builtins.str]]'
file.py:3: error: Incompatible types in assignment (expression has type "Sequence[str]", variable has type "str")

So, as far as I understand, mypy tried to find the strictest type for values of d, which is Sequence[str] in this case. In Python, string is a Sequence of (sub)strings (s[0] == s[0][0] == s[0][0][0]), also list is a Sequence, so this is understandable. Unfortunately, the rule that string is a sequence of (sub)strings does not work in case of value: str = d['k1'].

Expected output To get rid of the error I need to manually give more general type: d: Dict[str, Any], so this is not such a big problem. Having said that, expected behavior would be: No "Incompatible types in assignment" error for the given code.

Versions Python 3.6.5 mypy==0.600

gvanrossum commented 6 years ago

This is interesting. We can suggest workarounds etc. for the example code, but I wonder -- does any code that declares it wants a Sequence[str] really expect a single str object?

We could remove the Sequence[str] and instead add explicit dunder methods to str objects. It would still be seen as Sized, Iterable[str], Container[str] (in 3.6+), and Reversible[str], because those are all Protocols. But Sequence is not, and(for the first time ever!) we would have a way to state that a function takes a Sequence[str] but not a str.

Not sure how much code would break...

pawelswiecki commented 6 years ago

@gvanrossum Thank you for the reply.

One more thing (which is probably obvious for someone more knowlegable). Why mypy treats those two cases differently:

d1 = {'k1': 'qwe', 'k2': []}
reveal_type(d1)

d2 = {'k1': [1], 'k2': []}                                                     
reveal_type(d2) 

mypy:

$ mypy file2.py 
file2.py:2: error: Revealed type is 'builtins.dict[builtins.str*, typing.Sequence*[builtins.str]]'
file2.py:5: error: Revealed type is 'builtins.dict[builtins.str*, builtins.object*]'

I'd say type of dict's value in the second shoud be, analogically, Sequence[int], no?

EDIT: I think I can guess: in the first case empty list is treated kinda as an empty string, which is quite wrong.

ilevkivskyi commented 6 years ago

The second example is a known issue https://github.com/python/mypy/issues/5045

Zac-HD commented 6 years ago

We can suggest workarounds etc. for the example code, but I wonder -- does any code that declares it wants a Sequence[str] really expect a single str object?

Yep!

For example, in Hypothesis we have sampled_from, which supports ordered collections including Sequence, OrderedDict, and Enum. Excluding str from that would just force a pointless conversion from sampled_from(astring) to sampled_from(list(astring)) on all our users, or we would have to add string types to the type hint.

So at the very least, I have a strong concrete (as well as aesthetic) preference to keep treating str as Sequence[str] whenever needed.

gvanrossum commented 6 years ago

If we went this way, adding the type hint to sampled_from would be the right solution there.

It sounds there's no way to please everybody.

bwo commented 5 years ago

We can suggest workarounds etc. for the example code, but I wonder -- does any code that declares it wants a Sequence[str] really expect a single str object?

I also have a use case for this, where I have parsers some of which expect Sequence[T] (or Iterable[T]) and others of which expect T, and I'd like to be able to use them together when T == str in combinators that (e.g.) sequence one parser after another.

stereobutter commented 5 years ago

I think both the runtime behavior of treating a string as instance of collections.abc.Iterable and static checks like hello world matching Iterable[str] are design defects introducing weird corner cases in the code of most users. I suspect the following is a canonical example of unexpected behavior:

def process_words(seq: Iterable[str]) -> Iterable[str]:
    return [s.upper() for s in seq]

where the intended use case is

process_words(['hello', 'world'])  # ['HELLO', 'WORLD']

and an unintended side effects of the implementation is

process_words('hello' world')  # ['H', 'E', 'L', 'L', 'O', ' ', 'W', 'O', 'R', 'L', 'D']

Given that the runtime behavior of str is unlikely to change (I assume?), at least making mypy respect the difference between Iterable[str] and str will warn the users of this pitfall.

bwo commented 5 years ago

But str is an iterable of str. Making mypy lie about what is actually the case because it might be an error seems extremely backwards to me.

ethanhs commented 5 years ago

But str is an iterable of str

Yes, and this is actually really nice, having to deal with character types adds a lot of unneeded complexity. The correct way to deal with this is that whenever you want an Iterable[str], have an assert not isinstance(seq, str) or similar, to make sure it isn't passed a str. Sometimes handing a function a str is acceptable though...

stereobutter commented 5 years ago

A clean but albeit somewhat ugly solution would be to introduce new types in typing (and hopefully also new abstract base classes in collections.abc) for iterables and sequences that treat str as atomic and not a member of said types.

Example:

x: typing.NewIterable[str] = ['hello', 'world'] # typechecks
y: typing.NewIterable[str] = 'hello world' # doesn't typecheck

isinstance(['hello', 'world'], collections.abc.NewIterable) # True
isinstance('hello world', collections.abc.NewIterable) # False

I think this is do-able but alas

There are only two hard things in Computer Science: cache invalidation and naming things. -- Phil Karlton

What should be the names of the new Sequence and Iterable? My previous comment was made with the thought in mind that in properly typed code the use of NewSequence and NewIterable would probably be higher than of the regular Sequence and Iterable.

In the spirit of the python Zen (explicit is better than implicit, there should be one obvious way to do it, yada yada) annotations like Union[str, NewIterable[str]] for cases where both a single string and an iterable sequence of strings are valid, beat the current situation imho.

wei-hai commented 4 years ago

experiencing the same issue, any update?

JukkaL commented 4 years ago

One way to deal with this would be to introduce a new optional error that is generated when str is used while Iterable[str] or Sequence[str] is expected. It might be hard to catch all cases of this, but at least the most common ones such as passing an str argument, assigning a str value and iterating over a str should not be difficult to handle.

bwo commented 4 years ago

Again, str is an Iterable[str].

My understanding of the goal of mypy as a project is that it's a type system for Python as it exists, not as it might be if it were slightly nicer.

asford commented 3 years ago

One way to deal with this would be to introduce a new optional error that is generated when str is used while Iterable[str] or Sequence[str] is expected. It might be hard to catch all cases of this, but at least the most common ones such as passing an str argument, assigning a str value and iterating over a str should not be difficult to handle.

@JukkaL how challenging do you think it would be implement a putative --warn-iterable-anystr or --no-iterable-anystr in mypy? I'd imagine that the semantics would be as you described...

For example:


def just_sequence(vals: Iterable[str]) -> Any:
    ...

def sequence_or_str(val_or_vals: Union[str, Iterable[str]]) -> Any:
   ...

# mypy error or warning if `--warn-iterable-anystr` or `--no-iterable-anystr` is provided
just_sequence("bad") 

just_sequence(["this", "is", "fine"])
sequence_or_str("this_is_fine")
sequence_or_str(["this", "is", "fine"])

This feels like a longstanding issue, eg https://github.com/python/typing/issues/256, and there's strong evidence that this is both a common error and pain point in the language. So much so, that it's one of the very few deviations between python and starlark's semantics.

However it seems clear from the discussion here and elsewhere that a typing-level or language level change would be controversial. I think support for this error detection in mypy would be a valuable immediate addition and may provide a useful starting point to determine if this issue warrants enough effort to consider typing level enhancements to address it in the future.

I don't believe this can be implemented through mypy's existing plugin interface, but would be very happy to understand if you think the plugin interface can support this kind of feature.

ghost commented 1 year ago

I for one would make heavy use of an ability to specify that I expect some strings to not be treated as Sequence[str]. Note that str and Sequence[Sequence[Sequence[...[str]]]] are synonymous, which results in this unreasonable behavior:

NestedIntList = Union[int, Sequence["NestedIntList"]]

def complex_interpreter(src: str) -> NestedIntList:
    nesting = src.count("[")
    result: NestedIntList = src.replace("[", "")
    for _ in range(nesting):
        result = [result]
    return result
$ pyright demo6.py
0 errors, 0 warnings, 0 informations

$ mypy demo6.py
Success: no issues found in 1 source file

$ pytype demo6.py
Success: no errors found

I'd be heavily in favor of a deprecation cycle that allows str to no longer typecheck as Sequence, except for an opt-in annotation of RecursiveStr but since that seems infeasible due to existing code's inertia, I'd settle for an opt-out type annotation of AtomicStr which satisfies a str requirement but not a Sequence[Sequence[str]]. (Naming could use work, obviously.)

hauntsaninja commented 1 year ago

Closing as a duplicate of the more popular https://github.com/python/mypy/issues/11001. I've also added a SequenceNotStr type to https://github.com/hauntsaninja/useful_types