Closed ericmjl closed 6 years ago
Hi @ericmjl, great to meet you too!
There are two ways you could expose pyjanitor
methods to users:
The recommended way is to add them underneath an accessor object. This would look like:
import pandas as pd
import janitor
df = pd.DataFrame(...)
df = df.janitor.clean_names()
df = df.janitor.remove_empty()
When you import janitor
, it registers/attaches the .janitor
accessor to the pandas DataFrame. All the janitor methods live underneath this accessor. This keeps the janitor methods self-contained. It also means that every DataFrame in the namespace will have the janitor
accessor.
To add an accessor and methods:
import pandas_flavor
@pandas_flavor.register_dataframe_accessor('janitor')
class JanitorAccessor(object):
def __init__(self, df):
self.df = df
def clean_names(self):
...
Your second option is to add methods directly to the DataFrame. This would allow you to chain commands like in your example above. The methods are added to the DataFrame object itself, before initialization.
This would look like:
import pandas as pd
import janitor
df = pd.DataFrame(...).clean_names().remove_empty()
To add methods, simple write them as functions and register them with the DF.
import pandas_flavor
@pandas_flavor.register_dataframe_method
def clean_names(df):
...
Does this help answer your question?
The part that I was missing was that I just had to import janitor, and do nothing with it afterwards :smile:. Thanks for clarifying!
One thing that does happen with Pyjanitor though, is that upon decoration, my functions (which all return a dataframe) now return None
, which makes them untestable. I think I know what's going on (there is no return statement when registering a function); is this hypothesis correct? If so, would it make sense to put in a PR to return the original function as well, or will this break the functionality of the pandas_flavor?
Ah, you're totally right! There should be return statements inside the inner
function of the register_dataframe_method
and register_series_method
decorators. This won't break functionality and should allow you to run tests.
We need to add a return method
after these lines:
https://github.com/Zsailer/pandas_flavor/blob/bb892346dbe42c04725f0182c79e401496211bda/pandas_flavor/register.py#L31-L32
and
If you'd like to put in a PR, that would be great! Otherwise, I can do it later today.
Thanks!
I'm on it!
Hey @Zsailer, great to meet you at SciPy 2018!
I think
pandas_flavor
is what I'd like to switch over to inpyjanitor
, where I simply register functions as a pandas accessor rather than subclass the entire dataframe outright.There is something a bit magical about how
pandas_flavor
works though. With subclassing, everything is quite transparent - I subclass pandas DataFrames, then have the users wrap their existing dataframe inside a Janitor dataframe, following which, all of the data cleaning methods are available:Say I decorated the Janitor functions as pandas accessors. How would things look like for an end-user? Would it be like the following?
I guess I'm just wondering, where and when does a decorated function get exposed up to pandas?
Thanks again for putting this out!