Closed tomicapretto closed 2 years ago
In tidypolars
I decided to implement .group_by()
slightly differently than in the tidyverse
- if a function can operate "by group" you use the by
arg. So this is how you would do it in your example.
import tidypolars as tp
from tidypolars import col
path = (
"https://gist.githubusercontent.com/netj/8836201/" +
"raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv"
)
iris = tp.read_csv(path).rename(species = 'variety')
(
iris
.mutate(
result = col("petal.width") + tp.mean(col("petal.width")),
by = "species"
)
)
shape: (150, 6)
┌──────────────┬─────────────┬──────────────┬─────────────┬───────────┬────────┐
│ sepal.length ┆ sepal.width ┆ petal.length ┆ petal.width ┆ species ┆ result │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str ┆ f64 │
╞══════════════╪═════════════╪══════════════╪═════════════╪═══════════╪════════╡
│ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ Setosa ┆ 0.446 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ Setosa ┆ 0.446 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ Setosa ┆ 0.446 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ Setosa ┆ 0.446 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 6.3 ┆ 2.5 ┆ 5.0 ┆ 1.9 ┆ Virginica ┆ 3.926 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 6.5 ┆ 3.0 ┆ 5.2 ┆ 2.0 ┆ Virginica ┆ 4.026 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 6.2 ┆ 3.4 ┆ 5.4 ┆ 2.3 ┆ Virginica ┆ 4.326 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 5.9 ┆ 3.0 ┆ 5.1 ┆ 1.8 ┆ Virginica ┆ 3.826 │
└──────────────┴─────────────┴──────────────┴─────────────┴───────────┴────────┘
Lots of functions have the by
arg so they can operate by group. mutate
, filter
, slice
, summarize
, etc.
Basically - if a function can operate "by group" in the tidyverse
you'll be able to use the by
arg in tidypolars
.
Hope this helps! If you have any other questions let me know.
Excellent! Thanks a lot for the prompt and awesome response!
Saw your blog post and I'm glad tidypolars is working out for you!
Figured I would mention that tidypolars has a .drop_null()
method. It works like the tidyverse's drop_na()
or pandas .dropna()
- though the .filter()
approach you used works as well.
You can also use it to drop nulls from specific columns if you want.
# drop nulls from all columns
df.drop_null()
# drop nulls from "x" and "y"
df.drop_null('x', 'y')
Awesome! I'll update the post!
First of all, I really like this package and I've started to use it a lot in my work. As a Pythonista whose first language is R, I really enjoy
tidypolars
.In R, we can do something like the following
Since we have a
group_by(Species)
call,dplyr
will subtract the mean that corresponds to each group in themutate()
operation (not the mean across all observations from all species).As far as I understand, this is still not possible with
tidypolars
since we don't have agroup_by
function that behaves in a similar way to the one indplyr
. So my questions aretidypolars
now?Again, thanks for the fantastic library!