pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.59k stars 1.89k forks source link

Wrong behavior description in `polars.dataframe.group_by.GroupBy.__iter__` #17460

Closed liufeimath closed 2 months ago

liufeimath commented 3 months ago

Description

In this doc: https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.dataframe.group_by.GroupBy.__iter__.html

It says

If a single string was passed to by, the keys are a single value instead of a tuple.

This is not true for 1.0.0, e.g.

df = pl.DataFrame({"foo": ["a", "a", "b"], "bar": [1, 2, 3]})
for name, data in df.group_by("foo"):  
    print(name)
    print(data)

prints

('a',)
shape: (2, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 1   │
│ a   ┆ 2   │
└─────┴─────┘
('b',)
shape: (1, 2)
┌─────┬─────┐
│ foo ┆ bar │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ b   ┆ 3   │
└─────┴─────┘

Keys are still a tuple, not a single value.

Link

https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.dataframe.group_by.GroupBy.__iter__.html

henryharbeck commented 3 months ago

This was fixed in https://github.com/pola-rs/polars/pull/17383, so will be resolved on the next release.

cmdlineluser commented 3 months ago

You can see the updated docs in the /dev/ section:

liufeimath commented 3 months ago

Thanks for the info! All good now.

nameexhaustion commented 2 months ago

Fixed by https://github.com/pola-rs/polars/pull/17383