python / cpython

The Python programming language
https://www.python.org
Other
62.33k stars 29.94k forks source link

Are built-in comparisons allowed to return numbers instead of booleans? #109791

Open ByteEater-pl opened 11 months ago

ByteEater-pl commented 11 months ago

I've found a surprising sentence in the Python documentation under Truth Value Testing:

Operations and built-in functions that have a Boolean result always return 0 or False for false and 1 or True for true, unless otherwise stated.

Relational operators seem to have no excepting statements, so IIUC they could return 0 and 1 instead of False and True on values of some built-in types (e.g. 7 < 3), even nondeterministically.

Thus, in order to satisfy a specification requiring of my code to produce only values True and False or for defensive programming (whenever that's important), or for logging purposes and others involving stringification (which seems to be the only difference guaranteed to be observable between bool values and corresponding standard numbers), it seems I should wrap logical expressions in calls to bool.

On the other hand, in PEP 285 – Adding a bool type I've found the following statements:

All built-in operations that conceptually return a Boolean result will be changed to return False or True instead of 0 or 1; for example, comparisons, the “not” operator, and predicates like isinstance().

All built-in operations that are defined to return a Boolean result will be changed to return False or True instead of 0 or 1. In particular, this affects comparisons (<, <=, ==, !=, >, >=, is, is not, in, not in), the unary operator ‘not’, the built-in functions callable(), hasattr(), isinstance() and issubclass(), the dict method has_key(), the string and unicode methods endswith(), isalnum(), isalpha(), isdigit(), islower(), isspace(), istitle(), isupper(), and startswith(), the unicode methods isdecimal() and isnumeric(), and the ‘closed’ attribute of file objects. The predicates in the operator module are also changed to return a bool, including operator.truth().

The only thing that changes is the preferred values to represent truth values when returned or assigned explicitly. Previously, these preferred truth values were 0 and 1; the PEP changes the preferred values to False and True, and changes built-in operations to return these preferred values.

However, PEPs seem to be less authoritative than the documentation (of which the language and library references are the main parts) and there are numerous deviations from PEPs (many of them mentioned explicitly). So until the team updates it, the stronger guarantees of the PEP aren't to be trusted, I think.

This is probably just a documentation issue and the fix I suggest is to amend the documentation with those guarantees.

My question about it at SO

pochmann commented 11 months ago

Are comparisons allowed to return numbers instead of booleans?

From their documentation:

By convention, False and True are returned for a successful comparison. However, these methods can return any value

ByteEater-pl commented 11 months ago

Yea, I also came across this fragment, @pochmann. Indeed, they can return any value, so the programmer implementing them on a type isn't required to return True or False, but is advised to by this guideline ("convention"). That's not the issue. It's about whether built-in relations can return four values (True, False, 1 and 0 (notwithstanding the fact that there can be many 1s and 0s which compare as different in terms of is, at the implementation's discretion) or just two (True and False).

ByteEater-pl commented 11 months ago

I've added "built-in" to the title to make it clearer.

pochmann commented 11 months ago

Ah, sorry. Although the doc says "Operations and built-in functions". So "Operations", not only "built-in operations" like the PEP. Maybe that at least explains the discrepancy...

AA-Turner commented 11 months ago

Thus, in order to satisfy a specification requiring of my code to produce only values True and False or for defensive programming (whenever that's important), or for logging purposes and others involving stringification (which seems to be the only difference guaranteed to be observable between bool values and corresponding standard numbers), it seems I should wrap logical expressions in calls to bool.

If you want absolute certainty that you are operating on True or False then this is the right thing to do. However given Python's general approach to truth testing, I'd only really be convinced by the string representation argument.

Whilst we could update the documentation, it would require verifying that every instance of __bool__ complied, and might impact upon third-party packages due to 1/0 no longer being 'allowed'.

which seems to be the only difference guaranteed to be observable between bool values and corresponding standard numbers

There are several differences, most notably the type!

A

ByteEater-pl commented 11 months ago

it would require verifying that every instance of bool complied

Only in the standard library.

might impact upon third-party packages due to 1/0 no longer being 'allowed'

How so? Third-party packages can still produce whatever truthy and falsy values they like, including 1 and 0, and even '#t' and ().

There are several differences, most notably the type!

Interesting! What are those? (By the type I assume you mean isinstance(x, bool); that's obvious, what I meant were differences not naturally following from the setup of bool as a subclass of int with just two instances.)

JamesParrott commented 11 months ago

In numpy, comparison operators on arrays return arrays of booleans. Python lets the user overload operators however they want, for whatever is best for their application. If they also want to support boolean casting, e.g. for if statements, they can optionally define __bool__ on the return type.

ByteEater-pl commented 11 months ago

@JamesParrott, thanks, I've learnt something new from your comment!

I'm replying to it, though, mainly to make sure it's just for context and not a result of misunderstanding the issue I've raised. I'm all for doing such stuff as NumPy and keeping it working. My beef is with language built-ins (reachable through syntax or the standard library) being specified with too much latitude and still, despite the introduction of bool, allowed to return 0 and 1 instead of False and True, at the implementation's discretion. The PEP said it'd be fixed. Time to follow through.

In other words, little should be expected from programmers (the bare minimum for the language facilities to make sense of it), but programmers should be given the strongest guarantees possible from the language (including the standard library) that don't require sacrificing (too much) other tenets.

JamesParrott commented 11 months ago

Yep it's just for wider interest, to show a mainstream usage of a non-boolean return value of == etc.

I'm not sure what your concern is, but under the hood, the results of expressions in if statements (and assert statements) have their __bool__ method called, if it's there. Implementations on native types (e.g. int.__bool__) can define what is considered Truthy and Falsey in the language.

If you need a strict typesafe guaranteed result from an expression, then explicitly calling bool on it is a simple, efficient, readable solution.

ByteEater-pl commented 11 months ago

simple, efficient, readable

All true. But it's in no way obvious that it's needed. I bet more than 95% Python programmers would say that (2 < 3) is True definitely.

And as for "efficient", it's still a function call.

Another disadvantage is it no longer being simple when the argument is to be abstracted. Instead of 0 .__lt__ you need e.g. lambda x: bool(0 < x) (with extra parentheses around it in many contexts).

For completeness (getting closer to it, as in Python it most of the time seems unreachable, ain't it? 😜), another (much smaller) con is that the solution requires bool's value to not have been adversely modified.