python / typeshed

Collection of library stubs for Python, with static types
Other
4.38k stars 1.76k forks source link

Set dunder operators too permissive? #3184

Open kaste opened 5 years ago

kaste commented 5 years ago

Similar to #1840

With #3181 I made KeysView consistent with Set but I really think the operators are confusingly permissive.

Let's look at them

# difference
{'1'} - {2}
reveal_type({'1'} - {2})  # => Set[str]

This should fail because this is always a null operation and thus a programmer error.

# intersection
{'1'} & {2}  # => Set[str]
reveal_type({'1'} & {2})  # => Set[str]

This really should fail because it is unlikely meaningful; the return type is always an empty set thus it is also not a Set[str]

# sym difference
{'1'} ^ {2}
reveal_type({'1'} ^ {2})  # => Set[Union[str, int]]

Although the assumed union type is correct, Set[A] & Set[B] => Set[A|B] just doesn't make sense. The operator can only meaningful be applied to same type sets. What we do here is just a form of plain concat or union.

# union
{'1'} | {2} # MAYBE OK
reveal_type({'1'} | {2})  # => Set[Union[str, int]]

If we look at Sets as just containers without some properties, we could accept it to construct mixed type Sets using 'union'. I think it is okay for Python, the name union makes it meaningful, but I maybe never did that in practice. Since it adds to the type it is also likely that the following part of the program catches mistakes.

(But lists don't allow that [1] + ['2'] although they're really arbitrary containers. Except my PR #3183.)

srittau commented 5 years ago

Could you prepare an experimental PR (maybe after #3181 is merged), so interested parties can try this against real world code bases? This way we could better evaluate the impact this change would have (good or bad).

kaste commented 5 years ago

We have set, frozenset, AbstractSet, MutableSet, and ItemsView and KeysView. Should a PR make all of them consistent?

srittau commented 5 years ago

That would make sense to me.

kaste commented 5 years ago

I think I need help here. Current situation in master is. Given:

ints: Set[int]; floats: Set[float]; strs: Set[str]

union is defined "restrictive" T -> T -> T and __or__ is "additive permissive" T -> S -> Union[T, S]

strs.union(ints)   # FAILS
ints.union(floats) # FAILS
floats.union(ints) # PASSES
strs | ints        # PASSES
ints | floats      # PASSES
floats | ints      # PASSES

In contrast intersection and __and__ are permissive T -> Any -> T.

# ALL PASS
strs.intersection(ints)
ints.intersection(floats)
floats.intersection(ints)
strs & ints
ints & floats
floats & ints

These are the two type variants we see for Set. Do we want consistency between the methods and the dunder constructors?

If I make intersection "restrictive" T -> T -> T mypy will reject strs & ints but we break the law ints & floats == floats & ints because mypy surprisingly accepts the latter. It should reject both. On the other side the union method already behaves like that.

We could also go pragmatic: union and symmetric_difference as "additive" T -> S -> Union[T, S], intersection as T -> S -> S t.i. returning the Right. Whenever we change the type we have at least a good chance the following code will catch an error. No change for difference comes to my mind though.