wesm / pandas2

Design documents and code for the pandas 2.0 effort.
https://pandas-dev.github.io/pandas2/
306 stars 41 forks source link

monad-ish API #78

Open sleak-lbl opened 5 years ago

sleak-lbl commented 5 years ago

A pattern that has repeatedly caught me out is stringing operations together like: df['myfield'].notnull().unique() .. the error here is that notnull() returns a mask rather than a slice of the dataframe or series. Having most operations return a dataframe/series with the same signature, and ones that don't being more obvious, would probably ease the learning curve and help to avoid some user code errors, eg: df['myfield'].notnull() # returns a series of same dtype as df['myfield'], with the N/A rows dropped df['myfield'].is_notnull() # returns a series of dtype boolean to use as a mask

"monad-ish" because operations generally return an object of the same type.

This would unfortunately cause hard-to-find-in-user-code changes to the API

sleak-lbl commented 5 years ago

having just hit "submit", I realized that using dropna() in this case is better than messing around with notnull() and making a mask .. but I still think a strong signal in method names for when the return type is not the same as the input type, is valuable