kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

Select if / drop if #22

Closed bleearmstrong closed 6 years ago

bleearmstrong commented 7 years ago

This PR adds select_if and drop_if functionality to dfply. Usage is like:

df >> select_if(lambda col: mean(col) > 3)

returns a dataframe based on df, where each column has a mean > 3. This means that only numeric columns will be returned, since other columns won't have a mean. Any valid lambda function should be sufficient, so you can use and or or, e.g.

df >> select_if(lambda col: mean(col) > 3 or 'Ideal' in col.values)

Will return a dataframe based on df, where each column either has a mean > 3 or contains the word 'Ideal' somewhere in there.

kieferk commented 6 years ago

I've added these functions and associated tests manually to v0.3.1 branch due to conflicts. I also made some modifications.