machow / siuba

Python library for using dplyr like syntax with pandas and SQL
https://siuba.org
MIT License
1.16k stars 49 forks source link

summarize() can't use np.sqrt #399

Closed HuangHam closed 2 years ago

HuangHam commented 2 years ago

Hi! So glad to find a tidyverse equivalent in python. I encountered the following issue:

data = pd.concat([df_human, df_sim]) >> \ groupby(.subj, .trial, .split, .agent,.inequality) >> \ summarize(reward = np.sqrt(np.mean(_.reward)))

Note I wanted the square root of the mean of the variable named reward. but this gives me an error: invalid __array_struct__ . This error doesn't show up for other np functions such as np.size, np.mean, np.std. So I'm really confused...

machow commented 2 years ago

Hey--

from pandas import Series
import numpy as np

ser = Series([1,2,3])

# doesn't work when translated to siuba
np.sqrt(ser)

# use this
ser.pipe(np.sqrt)
machow commented 2 years ago

I'll work on supporting calls like np.sqrt(_.some_col) using numpy's dispatch mechanisms. (But it might not be possible).

machow commented 2 years ago

Fixed in version 0.2.3!

from siuba.data import mtcars
from siuba import _, mutate, group_by
import numpy as np

mtcars >> group_by(_.cyl) >> mutate(res = np.sqrt(np.mean(_.hp)))