If you have comments or can explain your changes, please do so below
Opening as draft to get comments/guidance on the approach. Also, (1) Dask behaviour, (2) column name resulting from right arithmetics on Series in Polars, (3) leftover binary dunder methods need more thourough exploration on my side.
Another point, this PR should solve both of the issues in principle. I combined the two fixes because I originally thought that the second (#509) could naturally be resolved as a consequence of the first, but at the end that's not really the case (at least via the approach I followed). Perhaps I should separate the two?
Key points:
853:
- Tweak `validate_column_comparand` in `_pandas_like` so as to reindex `other` via the lhs series only when possible (which is what the error message was complaining about when outputting _ValueError: Length mismatch: Expected axis has 3 elements, new values have 1 elements_)
- Broadcast the lhs series via `maybe_broadcast_scalar_into_series` when relevant (i.e. when of different length wrt the previously validated/broadcasted rhs `other`)
509:
- Add optional `alias` parameter to `reuse_series_implementation` and assign it to `expr._output_names` (basically, the approach Marco was showing in last week's livestream) and pass it all along to be able to rename the output of rarithmetic ops as `"literal"` without incurring into _safety assertion_ errors.
Points that need - for sure - further exploration:
Dask behaviour related to both #853 and #509 (I had to xfailtests/frame/lit_test.py::test_lit_operationand tests/expr_and_series/arithmetic_test.py::test_right_arithmetic_expr); what happens here is that Dask's lit reasonably aligns the index with the native frame, but here a reduction takes place, which shrinks the index of the resulting frame causing a mismatch. Not sure this can be properly handled given that computations are deferred.
Right arithmetic operations on Series in Polars return an unnamed column. So, while
outputs "literal" as col name as reported in #509, 1 + pl.Series([1, 2, 3]) returns an unnamed series. This does prevent tests/expr_and_series/arithmetic_test.py::test_right_arithmetic_series to test for the output column name being "literal". Point is that, as we're generally reusing series implementation in expressions, this might need to be taken into consideration as on other backends we're kind of imposing output name to be "literal".
Analyse behaviour of the binary dunder methods that were momentaneously left apart.
What type of PR is this? (check all applicable)
Related issues
Checklist
If you have comments or can explain your changes, please do so below
Opening as draft to get comments/guidance on the approach. Also, (1)
Dask
behaviour, (2) column name resulting from right arithmetics on Series inPolars
, (3) leftover binary dunder methods need more thourough exploration on my side.Another point, this PR should solve both of the issues in principle. I combined the two fixes because I originally thought that the second (#509) could naturally be resolved as a consequence of the first, but at the end that's not really the case (at least via the approach I followed). Perhaps I should separate the two?
Key points:
853:
509:
Points that need - for sure - further exploration:
Dask
behaviour related toboth#853and #509(I had toxfail
tests/frame/lit_test.py::test_lit_operation
and); what happens here is thattests/expr_and_series/arithmetic_test.py::test_right_arithmetic_expr
Dask
'slit
reasonably aligns the index with the native frame, but here a reduction takes place, which shrinks the index of the resulting frame causing a mismatch. Not sure this can be properly handled given that computations are deferred.Polars
return an unnamed column. So, whileoutputs
"literal"
as col name as reported in #509,1 + pl.Series([1, 2, 3])
returns an unnamed series. This does preventtests/expr_and_series/arithmetic_test.py::test_right_arithmetic_series
to test for the output column name being"literal"
. Point is that, as we're generally reusing series implementation in expressions, this might need to be taken into consideration as on other backends we're kind of imposing output name to be"literal"
.