Open mkleinbort opened 6 months ago
Let me understand, do you mean to make all elements null here? This is not consistent with my understanding of clear without arguments. 🤔
I think in the expression context it makes sense for the number of rows to be implicit. But to clarify the ask, it's not obvious how to "null" a column or group of columns in polars.
import polars as pl
import polars.selectors as cs
pl.DataFrame({
'x1': [{'name': 'Alice'}],
'x2': [2]
}).with_columns(
cs.by_name('x1', 'x2').map_elements(lambda _: pl.lit(None)).name.suffix('_v1'),
cs.by_name('x1', 'x2').add(pl.lit(None)).name.suffix('_v2'),
)
shape: (1, 6)
┌───────────┬─────┬────────┬────────┬───────────┬───────┐
│ x1 ┆ x2 ┆ x1_v1 ┆ x2_v1 ┆ x1_v2 ┆ x2_v2 │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ struct[1] ┆ i64 ┆ object ┆ object ┆ struct[1] ┆ i64 │
╞═══════════╪═════╪════════╪════════╪═══════════╪═══════╡
│ {"Alice"} ┆ 2 ┆ null ┆ null ┆ {null} ┆ null │
└───────────┴─────┴────────┴────────┴───────────┴───────┘
Doing map_elements
does not preserve type, and trying do to operations with null is not consistent in how it affects various types (e.g. adding pl.lit(None)
does not work as I intended with structs, and crashes on list-type columns). Using .replace
also does not work great.
Do you know a better way to tell polars to convert all values in a column(s) to null
while keeping the schema unchanged?
How about pl.when(pl.col("a").is_null()).then(pl.col("a")).otherwise(None)
? The schema should be the same as truthy
expr. But I didn't think more carefully about the multi-column case.
I had used pl.when(False).then(pl.all())
for this, but it doesn't work with your example:
# PanicException: not implemented for dtype Object("object", Some(object-registry))
I had thought it would be handier if .clear()
defaulted to the .height
instead of 0
(but maybe it doesn't make sense for the LazyFrame case?)
I had thought it would be handier if .clear() defaulted to the .height instead of 0 but maybe it doesn't make sense for the LazyFrame case.
I think the .clear
on the expression side (e.g. pl.col('x').clear()
) would have to evaluate to a full-length column of nulls, same as pl.lit(None)
.
Also, I think we can all agree:
pl.when(False).then(pl.all())
is 10% genious and 90% a hack around a missing api.
also, my current solution is
columns_to_null = ['a','b','c']
df.with_columns(pl.lit(None, dtype=dtype).alias(c) for c,dtype in df.schema.items() if c in columns_to_null)
Description
I had the need to return an all-null version of a column without changing ita name or data type.
There are a few ways of doing this in the expression API (e.g. multiplying times None, using
map_elements
, etc...), but I didn't see an obvious and idiomatic way to simply return nulls.For that usecase,
pl.Expr.clear()
seems like the right solution.It could the be used in the usual places...