Open adamgreg opened 1 month ago
It sort of reminds me of an "anti" join - or update()
but with DIFFERENCE instead of UNION.
(Not sure if a concat(..., how="anti")
would make sense?)
It seems like one would have to manually find the difference in this case and forward fill?
default = pl.LazyFrame({'a': [3], 'b': [None]})
tests = [
{'a': [1, 2], 'b': [1, 2]},
{'b': [1, 2]},
{'a': [1, 2]}
]
for test in tests:
lf = pl.LazyFrame(test)
names = default.collect_schema().keys() - lf.collect_schema().keys()
(pl.concat([lf, default.select(names)], how='horizontal')
.with_columns(pl.col(names).forward_fill())
.select('a', 'b')
.collect()
)
# {'a': [1, 2], 'b': [1, 2]}
# {'a': [3, 3], 'b': [1, 2]}
# {'a': [1, 2], 'b': [None, None]}
Thanks @cmdlineluser, that's very interesting. The real case is complicated a little more by the fact that what is ultimately selected can be arbitrary passed-in expressions that may cut across multiple sources. I think Polars has better support for introspection and relaxed concatenation since I wrote the original code though, so I can probably treat this as an opportunity to simplify!
Checks
Reproducible example
Log output
Issue description
From Polars 1.7 onwards (reproduced in 1.9.0 and 1.7.0), there has been a change in the broadcasting behaviour of
LazyFrame.with_context()
. Previously, you could use a single-row LF to provide "default" values for columns missing in the "main" LF, regardless of its height. Now there is an exception about the difference in heights. With version 1.6.0 there is no problem.I rely upon this behaviour in an internal package I maintain. There is a function which needs to provide default values for input tables that may have missing columns, as well as add columns with repeated values taken from another DF.
LazyFrame.with_context()
has previously worked well for this purpose. Since it's now deprecated, I'm keen to move to an alternative solution, but I'm not sure how.. I don't thinkconcat(how="horizontal")
will work, where columns may be duplicated, and broadcasting of a single row is required.Thanks for reading. I'm a huge fan of Polars, and have been evangelizing it more than ever since the API stabilized!
Expected behavior
Installed versions