kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
890 stars 103 forks source link

mutate does not work with pandas.to_datetime #27

Closed Make42 closed 7 years ago

Make42 commented 7 years ago

I have a DataFrame for which hub2['time'] = pd.to_datetime(hub2.timestamp) works, but when I write hub2 >> mutate(time=pd.to_datetime(X.timestamp)) I get the error

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "[...]/lib/python2.7/site-packages/pandas/util/decorators.py", line 91, in wrapper
    return func(*args, **kwargs)
  File "[...]/lib/python2.7/site-packages/pandas/tseries/tools.py", line 419, in to_datetime
    elif isinstance(arg, ABCSeries):
  File "[...]/lib/python2.7/site-packages/pandas/types/generic.py", line 9, in _check
    return getattr(inst, attr, '_typ') in comp
TypeError: __nonzero__ should return bool or int, returned Call

Why is that?

kieferk commented 7 years ago

Hello - sorry it's been a long time since I checked these issues!

The reason this doesn't work is that pd.to_datetime will try to evaluate X.timestamp immediately, causing an error due to the fact that it doesn't know what to do with X.timestamp.

The error will be different now in v0.3.0, but the reason is the same. If you want to use a custom window function like that in the piping syntax, you need to make a version of it that can "delay itself" when it encounters symbolic arguments.

Luckily this is quite easy, you just need to decorate a function with the @make_symbolic decorator like so:

@make_symbolic
def to_datetime(series):
    return pd.to_datetime(series)

This version of the function can be used inside the pipe:

hub2 >> mutate(time=to_datetime(X.timestamp))

I actually used this a one of the examples in the new readme documentation, because it's perfect for explaining the use of @make_symbolic for custom window functions. Thanks for that!

Hope this helps and clears up the issue for you. Cheers