Open zfaee opened 1 year ago
Why would you expect to be able to? True / 2
, False - 8
, or 10 // -True
don't mean anything, and if you explicitly want to treat True/False as 1/0 integers then you can just cast
to make the intent clear. Otherwise automatic casting here seems much more likely to allow for the introduction of bugs (given that most math ops on bool
are invalid) ;)
Why would you expect to be able to?
True / 2
,False - 8
, or10 // -True
don't mean anything, and if you explicitly want to treat True/False as 1/0 integers then you can justcast
to make the intent clear. Otherwise automatic casting here seems much more likely to allow for the introduction of bugs (given that most math ops onbool
are invalid) ;)
just run:
import polars as pl
print(True / 2, False - 8 , 10 // -True)
df = pl.DataFrame({'s': [True]})
s = pl.col('s')
print(df.with_columns(s_true_divide=s / 2, s_sub=s - 8, s_floordiv=s // 10))
output:
0.5 -8 -10
shape: (1, 4)
┌──────┬───────────────┬───────┬────────────┐
│ s ┆ s_true_divide ┆ s_sub ┆ s_floordiv │
│ --- ┆ --- ┆ --- ┆ --- │
│ bool ┆ f64 ┆ i32 ┆ i32 │
╞══════╪═══════════════╪═══════╪════════════╡
│ true ┆ 0.5 ┆ -7 ┆ 0 │
└──────┴───────────────┴───────┴────────────┘
so we can see that these are effective!
so we can see that these are effective!
That is because (in Python) booleans are a subtype of integer. However... none of those results appear to mean anything ;)
so we can see that these are effective!
That is because in Python booleans are a subtype of integer. However... none of those results appear to mean anything ;)
This behavior of Series is inconsistent with DataFrame, native python, and numpy, which will confuse developers.
Anymore, bool dtype series supports arithmetic wtih float, why not int?
~Supertypes have not been defined for Boolean
and signed integers. I believe that's an oversight, and they have been added for the reverse, as well as for unsigned integers.~
The issue seems to be the implementation of arithmetic on Series. There is actually a test for this behavior that mentions "do we want this to work?": https://github.com/pola-rs/polars/blob/228c89a3b998a27d873110cd9462b45175a905d0/py-polars/tests/unit/series/test_series.py#L1603-L1637
I would say yes. The following also works just fine:
s1 = pl.Series([True, False])
s2 = pl.Series([5])
print(s1 + s2)
shape: (2,)
Series: '' [i64]
[
6
5
]
So a fix is needed of the Series.__add__
etc. methods, as well as Series._arithmetic
.
I'd have said "no" and we should error elsewhere (because True + 8
or False * 0.12345
don't make any sense) but... perhaps this is the path of least resistance if it's already working in other contexts 😜
Checks
[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of Polars.
Reproducible example
Log output
Issue description
cannot do arithmetic with series of dtype: Boolean and argument of type: int, e.g.
s + 1, 1 * s
Expected behavior
expected to be supported.
Installed versions