pola-rs / polars

Dataframes powered by a multithreaded, vectorized query engine, written in Rust
https://docs.pola.rs
Other
28.66k stars 1.79k forks source link

[DOC] n_unique considers null values #7770

Open stevenlis opened 1 year ago

stevenlis commented 1 year ago

Problem description

It might be a good idea to add this into the doc.

https://pola-rs.github.io/polars/py-polars/html/reference/series/api/polars.Series.n_unique.html#polars.Series.n_unique

print(
    pl.Series("a", [1, 2, 2, None]).n_unique()
)
3
ritchie46 commented 1 year ago

Can you make a PR?

stevenlis commented 1 year ago

Would love to, but never made a PR before... do I just have to change this one line? https://github.com/pola-rs/polars/blob/c6db4884c949d4920831374f3d0f7ae23bea988b/py-polars/polars/series/series.py#L4644

zundertj commented 1 year ago

I would add another example below this one, with the text above that nulls are considered unique. I think this is actually very important to stress, as I would not have expected this behaviour per se, and would also add it in the description, i.e. the line in the docstring that says:

Get unique elements in series.

becomes

Get unique elements in series. Includes `null` if present.

Other functions that could use this clarification:

Not sure this is also applicable to DataFrame & GroupBy

stevenlis commented 1 year ago

https://github.com/pola-rs/polars/blob/master/py-polars/polars/functions/lazy.py#L873-L895

Do you have to manually change everything in the docstring? or just the code after >>> ? For example, if I change the code to:

>>> df = pl.DataFrame({"a": [1, 8, None], "b": [4, 5, 2], "c": ["foo", "bar", "foo"]})
>>> df.select(pl.n_unique("a"))

Will the output automatically change to the following?

    shape: (1, 1)
    ┌─────┐
    │ a   │
    │ --- │
    │ u32 │
    ╞═════╡
    │ 3   │
    └─────┘
zundertj commented 1 year ago

It wont. Easiest is to copy an output, if even wrong. Then you run make doctest, and it will give you a warning along the lines of "expected X, got Y". Verify that Y is what you want, and copy paste that into the docstring.

stevenlis commented 1 year ago

@zundertj Thanks. I will give it a try!