pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.1k stars 1.83k forks source link

Add argument `value_drop_null` to `melt()` #15273

Open mgirlich opened 5 months ago

mgirlich commented 5 months ago

Description

It is quite common to remove rows that only contain null values in the newly created value column after melting. It would be nice to add an argument value_drop_null to do this.

Example

df = pl.DataFrame(
    {
        "a": ["x", "y"],
        "b": [1, None],
    }
)

# defaults to False
melted_keep = df.melt("a", value_drop_null=False)
# ┌─────┬──────────┬───────┐
# │ a   ┆ variable ┆ value │
# │ --- ┆ ---      ┆ ---   │
# │ str ┆ str      ┆ i64   │
# ╞═════╪══════════╪═══════╡
# │ x   ┆ b        ┆ 1     │
# │ y   ┆ b        ┆ null  │
# └─────┴──────────┴───────┘

melted_keep = df.melt("a", value_drop_null=True)
# ┌─────┬──────────┬───────┐
# │ a   ┆ variable ┆ value │
# │ --- ┆ ---      ┆ ---   │
# │ str ┆ str      ┆ i64   │
# ╞═════╪══════════╪═══════╡
# │ x   ┆ b        ┆ 1     │
# └─────┴──────────┴───────┘
avimallu commented 5 months ago

Broadly a duplicate of https://github.com/pola-rs/polars/issues/8903.

I think this feature will also be better addressed by Polar's focus on composability of functions (i.e. using a drop_nulls()) over adding a specialized argument over it to keep the API surface small. There's little performance improvement or typing reduction for this argument.