narwhals-dev / narwhals

Lightweight and extensible compatibility layer between Polars, pandas, cuDF, Modin, and more!
https://narwhals-dev.github.io/narwhals/
MIT License
218 stars 31 forks source link

feat: add `narwhals.lit` #321

Closed EdAbati closed 1 week ago

EdAbati commented 1 week ago

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below.

EdAbati commented 1 week ago

Thank you both for the comments! :)

a. Passing a list into lit seems to work.

import polars as pl
import pandas as pd
import narwhals as nw

df = pd.DataFrame({"a": [0, 1]})
df_pl = pl.DataFrame(df)

@nw.narwhalify
def func(df):
    return df.with_columns(c=nw.lit([1, 2]))
>>> func(df_pl)
shape: (2, 2)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ a   ┆ c         β”‚
β”‚ --- ┆ ---       β”‚
β”‚ i64 ┆ list[i64] β”‚
β•žβ•β•β•β•β•β•ͺ═══════════║
β”‚ 0   ┆ [1, 2]    β”‚
β”‚ 1   ┆ [1, 2]    β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

>>> func(df)
   a       c
0  0  [1, 2]
1  1  [1, 2]

The only "issue" is that we cannot specify the correct DType (i.e. nw.lit([1, 2], dtype=...) just yet, because we don't have a list dtype. Or am I missing something?

b. Passing a np.array makes pandas and polars behave differently at the moment. πŸ€”


@nw.narwhalify
def func(df):
    import numpy as np

    return df.with_columns(c=nw.lit(np.array([1, 2])))
>>> func(df_pl)
shape: (2, 2)
β”Œβ”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”
β”‚ a   ┆ c   β”‚
β”‚ --- ┆ --- β”‚
β”‚ i64 ┆ i64 β”‚
β•žβ•β•β•β•β•β•ͺ═════║
β”‚ 0   ┆ 1   β”‚
β”‚ 1   ┆ 2   β”‚
β””β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”˜

>>> func(df)
   a       c
0  0  [1, 2]
1  1  [1, 2]
MarcoGorelli commented 1 week ago

thanks for checking that carefully - I think we should disallow passing numpy arrays to lit? at least for now - if we just raise an error for now, we can always loosen the behaviour later if we want to, whilst preserving backwards compatibility πŸ˜‡

just yet, because we don't have a list dtype. Or am I missing something?

you're right, we don't :) largely because potential consumers of Narwhals probably wouldn't be using it, given that in pandas pre-pyarrow it would just be object dtype ... but we should implement it properly in Narwhals eventually, it's definitely worthwhile

EdAbati commented 1 week ago

how did you go about figuring this all out? did you read the "how it works" page? did you use a debugger / read the code very carefully?

Kind of both. I read the "how it work works" page out of curiosity, when I found out about the project. To contribute to this I read the code, searching for similar implementations and used the debugger.

I found the code easy to follow, also thanks to the typing. So πŸ‘ πŸ‘ to you and the rest of the team

The contributing guide and process (with the template issues and PRs) is also nice!

Maybe we can add some documentation about the structure of the unit tests. I am used to find the same module/submodule structure as in the library. Here it seems the tests cases in the same submodule are in more scripts. I was a bit unsure where to add mine πŸ˜…

Regarding variable names, the only thing I noticed is the one I mentioned above. I was unsure why we needed to use PandasSeries, before realising that series_from_iterables was returning a β€˜native series’.

MarcoGorelli commented 1 week ago

Maybe we can add some documentation about the structure of the unit tests. I am used to find the same module/submodule structure as in the library. Here it seems the tests cases in the same submodule are in more scripts. I was a bit unsure where to add mine πŸ˜…

most of the structure of the unit tests is quite odd. probably because I did it in a hurry as a weekend project. if you fancy improving the tests structure, that's very welcome - i've added you to the team so feel free to submit anything you think is an improvement and merge anything you think is ready

EdAbati commented 1 week ago

Thank you for the feedback and for adding me to the team! This is "lit" πŸ”₯ (sorry had to make the joke)