pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
30.16k stars 1.94k forks source link

`next()` on GroupBy raises `AttributeError` object has no attribute `_current_index` #12868

Open cmdlineluser opened 11 months ago

cmdlineluser commented 11 months ago

Checks

Reproducible example

import polars as pl

next(pl.DataFrame().group_by(1))
# AttributeError: 'GroupBy' object has no attribute '_current_index'

Log output

No response

Issue description

Not sure if next() is intended to work or not, it seems like it should raise a TypeError instead if it isn't.

Expected behavior

"Work" or return a TypeError?

next([])
# TypeError: 'list' object is not an iterator

Installed versions

``` --------Version info--------- Polars: 0.19.19 Index type: UInt32 Platform: macOS-13.6.1-arm64-arm-64bit Python: 3.11.6 (main, Nov 2 2023, 04:39:40) [Clang 14.0.0 (clang-1400.0.29.202)] ----Optional dependencies---- adbc_driver_manager: cloudpickle: connectorx: deltalake: fsspec: 2023.6.0 gevent: matplotlib: numpy: 1.26.2 openpyxl: pandas: 2.0.3 pyarrow: 12.0.1 pydantic: pyiceberg: pyxlsb: sqlalchemy: xlsx2csv: xlsxwriter: ```
deanm0000 commented 9 months ago

https://github.com/pola-rs/polars/blob/main/py-polars/polars/dataframe/group_by.py

"just" needs this treatment

cmdlineluser commented 9 months ago

Yeah, the machinery is there I think.

_current_index is created inside __iter__

https://github.com/pola-rs/polars/blob/e9a95b74cee01f533b90bdf72ac8a021d0d3fcc3/py-polars/polars/dataframe/group_by.py#L114

So it works if you manually call iter()

df = pl.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]})

next(iter(df.group_by("a")))
# (1,
#  shape: (2, 2)
#  ┌─────┬─────┐
#  │ a   ┆ b   │
#  │ --- ┆ --- │
#  │ i64 ┆ i64 │
#  ╞═════╪═════╡
#  │ 1   ┆ 3   │
#  │ 1   ┆ 4   │
#  └─────┴─────┘)
deanm0000 commented 9 months ago

I think it needs to be in __init__ or else if in __next__ it needs to see if it exists and if not create it

s-bidowaniec commented 6 months ago

I can try to fix it

stinodego commented 5 months ago

This should raise a TypeError.

ritchie46 commented 1 day ago

Instead of the try-except proposed, we should return an object of a new class GroupIter on __iter__. Then it is solved by the type system.