pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.6k stars 1.89k forks source link

Support for making all possible combinations of elements in a df, multiIndex from the cartesian product of multiple iterables? #17338

Open JacobElder opened 3 months ago

JacobElder commented 3 months ago

Description

Pandas has support for "Make a MultiIndex from the cartesian product of multiple iterables." This is accomplished with pd.MultiIndex.from_product(). It makes all possible combinations of elements X, Y in a Pandas df.

https://pandas.pydata.org/docs/reference/api/pandas.MultiIndex.from_product.html

Tidyverse does this with cross() or expand_grid()

https://purrr.tidyverse.org/reference/cross.html

As far as I can tell, Polars does not yet have support for this sort of functionality. If it is not currently available, I was wondering if I could request support for this sort of functionality where you can generate a dataframe of all possible combinations of listed elements? Thank you!

etiennebacher commented 3 months ago

Might be duplicate of https://github.com/pola-rs/polars/issues/9722

mcrumiller commented 3 months ago

Here is a function do that with basic Series, it's easy to extend to other iterables.

from datetime import date
import polars as pl

def cross(series_list):
    """Generate a DataFrame consisting of all combinations of values in a list of Series."""
    lfs = [pl.LazyFrame(s) for s in series_list]
    out = lfs[0]
    for lf in lfs[1:]:
        out = out.join(lf, how="cross")
    return out.collect()

s1 = pl.Series("int", [1, 2, 3])
s2 = pl.Series("char", ["a", "b", "c", "d"])
s3 = pl.Series("date", [date(2024, 1, 1), date(2024, 1, 2)])

with pl.Config(tbl_rows=24):
    print(cross([s1, s2, s3]))
shape: (24, 3)
┌─────┬──────┬────────────┐
│ int ┆ char ┆ date       │
│ --- ┆ ---  ┆ ---        │
│ i64 ┆ str  ┆ date       │
╞═════╪══════╪════════════╡
│ 1   ┆ a    ┆ 2024-01-01 │
│ 1   ┆ a    ┆ 2024-01-02 │
│ 1   ┆ b    ┆ 2024-01-01 │
│ 1   ┆ b    ┆ 2024-01-02 │
│ 1   ┆ c    ┆ 2024-01-01 │
│ 1   ┆ c    ┆ 2024-01-02 │
│ 1   ┆ d    ┆ 2024-01-01 │
│ 1   ┆ d    ┆ 2024-01-02 │
│ 2   ┆ a    ┆ 2024-01-01 │
│ 2   ┆ a    ┆ 2024-01-02 │
│ 2   ┆ b    ┆ 2024-01-01 │
│ 2   ┆ b    ┆ 2024-01-02 │
│ 2   ┆ c    ┆ 2024-01-01 │
│ 2   ┆ c    ┆ 2024-01-02 │
│ 2   ┆ d    ┆ 2024-01-01 │
│ 2   ┆ d    ┆ 2024-01-02 │
│ 3   ┆ a    ┆ 2024-01-01 │
│ 3   ┆ a    ┆ 2024-01-02 │
│ 3   ┆ b    ┆ 2024-01-01 │
│ 3   ┆ b    ┆ 2024-01-02 │
│ 3   ┆ c    ┆ 2024-01-01 │
│ 3   ┆ c    ┆ 2024-01-02 │
│ 3   ┆ d    ┆ 2024-01-01 │
│ 3   ┆ d    ┆ 2024-01-02 │
└─────┴──────┴────────────┘