kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

Doc for masking on Null #18

Closed edlee123 closed 6 years ago

edlee123 commented 7 years ago

Hello, in your README examples would be very useful to have how to mask on Null / None, and not null?

kieferk commented 6 years ago

Hello there, sorry for the belated reply, I've been away for awhile.

I just recently pushed out a new version and I don't remember if masking was broken before or not, but here's and example of one way to do what you want using the mask function:

In [2]: d = diamonds >> group_by(X.cut) >> mutate(price_lag=lag(X.price)) >> head(2) >> select(X.cut, X.price_lag)

In [3]: d
Out[3]: 
          cut  price_lag
8        Fair        NaN
91       Fair      337.0
2        Good        NaN
4        Good      327.0
0       Ideal        NaN
11      Ideal      326.0
1     Premium        NaN
3     Premium      326.0
5   Very Good        NaN
6   Very Good      336.0

In [4]: d >> mask(X.price_lag.isnull())
Out[4]: 
         cut  price_lag
8       Fair        NaN
2       Good        NaN
0      Ideal        NaN
1    Premium        NaN
5  Very Good        NaN

In [5]: d >> mask(~X.price_lag.isnull())
Out[5]: 
          cut  price_lag
91       Fair      337.0
4        Good      327.0
11      Ideal      326.0
3     Premium      326.0
6   Very Good      336.0

You're right that it would probably be a good addition to the readme. The mask function takes boolean arrays corresponding to rows to filter by, so anything that evaluates to that should work.