Open jorisvandenbossche opened 3 years ago
@jorisvandenbossche What's the argument for disallowing at the moment? To me it seems more natural / Pythonic to allow this operation since booleans are essentially ints.
I am not sure what the historic reasons are.
Maybe because those operations were regarded as not that useful (although I would say it's up to the user to decide that), or as potentially confusing because they don't have a "boolean" interpretation, but only a numerical one (eg +
and *
still are a boolean operation resulting in booleans, and eg -
raises an error in numpy about not being supported for booleans).
It's only for division and power that the booleans are interpreted as numeric values (and currently in pandas the user needs to be explicit about such casting).
@jorisvandenbossche Yeah, I can see the argument of "why would anyone do this?" but if it's easy to allow and makes for greater consistency with numpy
and Python more generally, I personally like the idea of allowing this rather than making a special case here.
conditional on the Series behavior (which i wouldnt object to deprecating), I lean towards having BooleanArray behave like Series, i.e. raising here
Current thought here: BooleanArray (and IntegerArray and FloatingArray) ops should be wrappers around their core.ops.array_ops counterparts. This will allow for pushing more logic down from BooleanArray/NumericArray into BaseMaskedArray, which in turn will make it easier to extend BaseMaskedArray to wrap arbitrary dtypes.
or as potentially confusing because they don't have a "boolean" interpretation
FWIW, booleans are a canonical identification for GF(2) so I would argue that these operations are all well-defined and have a unique interpretation.
GF(2)
We don't have an official policy on this, but in general pandas is more averse to overflows than numpy, which corresponds to not treating arithmetic as modular.
Currently, for the plain
bool
dtype we explicitly check for some operations and raise an error, while those actually work in numpy. For example:This is done for the division and power operations (
not_allowed={"/", "//", "**"}
):https://github.com/pandas-dev/pandas/blob/934cad6ab61b867c6ae54941c5cd87340d44b80a/pandas/core/computation/expressions.py#L215-L218
For the nullable BooleanArray, for now we simply relied on the operations as defined by the underlying numpy bool array:
That's for the
BooleanArray
, but the check is currently done on the "array_op" level (but because it is done within expressions.py, we don't run that check for EAs, xref https://github.com/pandas-dev/pandas/pull/41161).So questions:
pd.Series(arr) / 1
does work, it's only disallowed if both operands are boolean)BooleanArray
level, and not only check it on the DataFrame/Series ops level inarray_ops.py
?