feature suggestion for `is` compare between different types

orSolocate commented 2 years ago

Current problem

Comparing two different types using the is operator is always False.

For example for this code:

A = 5
STR1 = ‘5’
if A is STR1:
    pass

A is of type int and STR1 is of type str. is compare between those types is always False.

The user should be alert about it, and consider modifying his code.

As far as I checked there is no checker that checks that..

Desired solution

I suggest a new checker (or extension of an existing one) that raises a message for this case.

message name suggestion: is-different-types or different-types-is-expression

message type: Warning

message content suggestion:

An 'is' comparison between different types always fails, consider using casting

Additional context

The check should be general to any comparison/expression, not necessarily an if comparison. I think we should raise a message even for cases like:

condition = '5' is 4

I thought about extending the checker to a == operator, but I am not really fond of the idea and I will explain why.

The __eq__ of Python classes can be modified to return True for different comparison values.
There is a (stupid) case where == actually returns True for different types. it happens for int and float, e.g. 1.0==1. [note that the statement 1.0 is 1 returns False, as 1.0 is of type float and 1 is of type int] [also note that this rare case only happens for small numbers]

orSolocate commented 2 years ago

Edit: actually I don't think casting should be the suggestion here. It would still be False after casting...

orSolocate commented 2 years ago

I can implement it if there is value in that for Pylint. @Pierre-Sassoulas or anyone else like the idea?

Pierre-Sassoulas commented 2 years ago

I'm hesitant about this one because it seems like a lot of effort with potentially a lot of false positive where the user knew what they're doing. There's also a lot of case that are already handled by thing like flake8's E711 or pylint's literal-comparison, which is why I added the discussion label to see what others are thinking about it.

DanielNoord commented 2 years ago

I'm having similar though as @Pierre-Sassoulas. I wonder how often this would catch actual bad code compared to the number of edge use cases that could turn into false positives.

orSolocate commented 2 years ago

could you explain why you think there will be false positives with an example?

The implementation can be restricted only to if visits. it can be reduced to work only on either built-in types or inferable names. otherwise ignored. This will increase false negatives and decrease false positives.

DanielNoord commented 2 years ago

I can't think of any right now, however, I wonder how many actual messages will be raised. How often do people use this pattern when they shouldn't? And how often do people use this when they should? I'm not totally convinced of the right balance between 1) catching actual errors and 2) the risk of emitting a number of false positives.

orSolocate commented 2 years ago

@DanielNoord well the value of catching such error is great because if you detect it inside an if comparison you detect unreachable code. I also have some doubt about (1) (catching actual errors), although maybe after implementing it and running it on the primer source codes (or Pylint itself) we will be surprised. Maybe I am missing something but I still don't quite get the big false positives risk. Is it difficult for Pylint to provide a type for a name? (e.g. using pytype or similar), you can try to infer and if it fails just skip. as far as I know when the astroid infer detects an object it mostly correct, so you know its type if it succeeds..

The only example I can think of that relate to false positives can be something like:

a = 256
b = 256
print(a is b)
# True

but it is not related to the checker I suggested at all..

DanielNoord commented 2 years ago

@orSolocate Perhaps running over primer will give us some actual data to talk about. The risk of false positives is inherent to requiring inference: there is always the risk of astroid getting something incorrect. For a checker for which I think we won't catch many actual problems I don't think it's worth the risk. But perhaps data from primer might convince me.

To expand a bit on what I mean with "won't catch many actual problems". Take this checker and the unspecified-encoding warning. It's completely possible to have your program function and pass all your tests without any encodings in any open call. However, when another user opens the program in a different environment suddenly problems might appear. I believe that's a very useful check and warns for a potential unknown problem. For this checker though: how often will people have this in their code without noticing that something is broken? I know we have many other checkers that should be noticed as well if you just test your own program, but this feels like a very niche issue that doesn't make it very often to production code. But I would like to be proven otherwise if we can actually find repositories in which this would be raised!

Pierre-Sassoulas commented 2 years ago

I'm adding the need decision label and PR / investigation label as seeing the effect on the primer would add valuable input to the decision.

orSolocate commented 2 years ago

I will try to check it when I have the time (keep this issue opened)

pylint-dev / pylint