Open stevenlis opened 5 months ago
df.group_by('id').agg(col('num').shuffle().head(n))
?
Polars actually has this feature. For your reference, https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.dataframe.group_by.GroupBy.map_groups.html
df = pl.DataFrame(
{
"id": [0, 1, 2, 3, 4],
"color": ["red", "green", "green", "red", "red"],
"shape": ["square", "triangle", "square", "triangle", "square"],
}
)
# not recommended
df.group_by("color").map_groups(
lambda group_df: group_df.sample(2)
)
# recommended
df.filter(
pl.int_range(pl.len()).shuffle(seed=42).over("color") < 2
)
Description
Polars lacks support for sampling within each group after a groupby, unlike pandas, which offers a similar feature:
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.sample.html