pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
29.91k stars 1.93k forks source link

Discussion: Should iterating list series yield Python lists? #9069

Open jonashaag opened 1 year ago

jonashaag commented 1 year ago

Following the discussion from here https://github.com/pola-rs/polars/pull/8501#issuecomment-1522821918 (cc @alexander-beedie)

I'd like to discuss what should be the iter type of List and Array. Currently we have:

# Yields Python lists
iter(pl.Series(dtype=pl.Array))
# Yields pl.Series
iter(pl.Series(dtype=pl.List))
# Yields Python dicts
iter(pl.Series(dtype=pl.Struct))

I wonder what types users expect if they do a for ... in list_series:.

mcrumiller commented 1 year ago

I think for type Array we should return either tuples (they have slightly lower memory overhead and better access speed, and the results of an iterated row aren't really meant to be modifiable) or Series.. Lists and dicts for pl.List and pl.Struct dtypes are the obvious choices in those cases.