narwhals-dev / narwhals

Lightweight and extensible compatibility layer between dataframe libraries!
https://narwhals-dev.github.io/narwhals/
MIT License
423 stars 76 forks source link

feat: add DataFrame.iter_rows #317

Closed Priyansh121096 closed 3 months ago

Priyansh121096 commented 3 months ago

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

I've matched the API with https://docs.pola.rs/api/python/stable/reference/dataframe/api/polars.DataFrame.rows.html

Priyansh121096 commented 3 months ago

Thanks for your comments. I'll address them soon.

Priyansh121096 commented 3 months ago

Just got a question about this functionality - instead of DataFrame.rows, couldn't we just add DataFrame.iter_rows?

@MarcoGorelli I was going to propose something similar (you beat me to it 😄). I was thinking we could expose both rows and iter_rows since the polars API does as well.

One issue with this one is I'm not sure if pandas has a convenient equivalent to this when named=True. Rest of the cases are fine:

  1. polars + named=False = df.iter_rows(named=False)
  2. polars + named=True = df.iter_rows(named=True)
  3. pandas + named=False = df.itertuples(index=False, name=None)
  4. pandas + named=True = ?

Please let me know if you're aware of a pandas API which returns an iterator for iterating over rows as dictionaries.

FBruzzesi commented 3 months ago

@Priyansh121096 I tried to play with the into argument of .to_dict() method, sadly with no success.

I guess that 4. can become iter(df.to_dict("records"))?!

MarcoGorelli commented 3 months ago

How about using https://docs.python.org/3/library/collections.html#collections.somenamedtuple._asdict (which , despite the underscore, is public):

something like (simplified)

def iter_rows(df):
    yield from (row._asdict() for row in df.itertuples(index=False))
Priyansh121096 commented 3 months ago

I guess that 4. can become iter(df.to_dict("records"))?!

I feel like this defeats the purpose of using iter_rows over rows though.

How about using https://docs.python.org/3/library/collections.html#collections.somenamedtuple._asdict (which , despite the underscore, is public):

Amazing! This should work.

Priyansh121096 commented 3 months ago

@MarcoGorelli I'll raise another PR for iter_rows soon. Can we merge this one?

MarcoGorelli commented 3 months ago

thanks - tbh I'm not really sure about DataFrame.rows, I might even suggest deprecating it altogether in Polars itself

could we just repurpose this one for DataFrame.iter_rows please? sorry for not having thought about this straight away

Priyansh121096 commented 3 months ago

could we just repurpose this one for DataFrame.iter_rows please?

@MarcoGorelli pushed a change for this.

MarcoGorelli commented 3 months ago

@pre-commit.ci autofix