kieferk / dfply

dplyr-style piping operations for pandas dataframes
GNU General Public License v3.0
889 stars 103 forks source link

mask does not work in 0.2.4 properly #33

Closed Make42 closed 6 years ago

Make42 commented 6 years ago

The line

series = signals.loc[(signals.type == sig_type) & (signals.part_number == sig_partnr), 'value']

is working for my code, the line

series = signals >> mask(X.type == sig_type, X.part_number == sig_partnr) >> select('value')

results in the error

Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "[..]/dfply/base.py", line 45, in __rrshift__
    result = self.function(other_copy)
  File "[..]/dfply/base.py", line 52, in <lambda>
    return pipe(lambda x: self.function(x, *args, **kwargs))
  File "[..]/dfply/base.py", line 112, in __call__
    return self.function(*args, **kwargs)
  File "[..]/dfply/base.py", line 179, in __call__
    evaluation = self.call_action(args, kwargs)
  File "[..]/dfply/base.py", line 253, in call_action
    return symbolic.to_callable(symbolic_function)(args[0])
  File "[..]/pandas_ply/symbolic.py", line 204, in <lambda>
    return lambda *args, **kwargs: obj._eval(dict(enumerate(args), **kwargs))
  File "[..]/pandas_ply/symbolic.py", line 142, in _eval
    result = evaled_func(*evaled_args, **evaled_kwargs)
  File "[..]/dfply/subset.py", line 55, in mask
    mask = mask & arg
  File "[..]/pandas/core/ops.py", line 915, in wrapper
    self, other = _align_method_SERIES(self, other, align_asobject=True)
  File "[..]/pandas/core/ops.py", line 629, in _align_method_SERIES
    left, right = left.align(right, copy=False)
  File "[..]/pandas/core/series.py", line 2411, in align
    broadcast_axis=broadcast_axis)
  File "[..]/pandas/core/generic.py", line 4937, in align
    fill_axis=fill_axis)
  File "[..]/pandas/core/generic.py", line 5006, in _align_series
    return_indexers=True)
  File "[..]/pandas/core/indexes/range.py", line 441, in join
    sort)
  File "[..]/pandas/core/indexes/base.py", line 3024, in join
    return_indexers=return_indexers)
  File "[..]/pandas/core/indexes/datetimes.py", line 1069, in join
    return_indexers=return_indexers, sort=sort)
  File "[..]/pandas/core/indexes/base.py", line 3033, in join
    return this.join(other, how=how, return_indexers=return_indexers)
  File "[..]/pandas/core/indexes/base.py", line 3046, in join
    return_indexers=return_indexers)
  File "[..]/pandas/core/indexes/base.py", line 3127, in _join_non_unique
    sort=True)
  File "[..]/pandas/core/reshape/merge.py", line 982, in _get_join_indexers
    llab, rlab, shape = map(list, zip(* map(fkeys, left_keys, right_keys)))
  File "[..]/pandas/core/reshape/merge.py", line 1412, in _factorize_keys
    llab, rlab = _sort_labels(uniques, llab, rlab)
  File "[..]/pandas/core/reshape/merge.py", line 1438, in _sort_labels
    _, new_labels = algos.safe_sort(uniques, labels, na_sentinel=-1)
  File "[..]/pandas/core/algorithms.py", line 483, in safe_sort
    ordered = sort_mixed(values)
  File "[..]/pandas/core/algorithms.py", line 476, in sort_mixed
    nums = np.sort(values[~str_pos])
  File "[..]/numpy/core/fromnumeric.py", line 822, in sort
    a.sort(axis=axis, kind=kind, order=order)
  File "pandas/_libs/tslib.pyx", line 1080, in pandas._libs.tslib._Timestamp.__richcmp__ (pandas/_libs/tslib.c:20281)
TypeError: Cannot compare type 'Timestamp' with type 'int'

What is the reason? My dataframe looks like

                                                 part_number         type     value
timestamps                                                                         
2017-08-01 00:00:32.651504  91cb9fa3859f4d44853f6200616db619        power1 -0.001651
2017-08-01 00:00:32.652504  91cb9fa3859f4d44853f6200616db619        power2  0.005068
2017-08-01 00:00:32.653504  91cb9fa3859f4d44853f6200616db619        power1 -0.004536
2017-08-01 00:00:32.654504  91cb9fa3859f4d44853f6200616db619        power2 -0.000084
2017-08-01 00:00:32.655504  5535c560ece9415f8f6ad996f1c23f6e        power1 -0.001114
2017-08-01 00:00:32.656504  5535c560ece9415f8f6ad996f1c23f6e        power2 -0.005621
2017-08-01 00:00:32.657504  5535c560ece9415f8f6ad996f1c23f6e        power1 -0.000638
2017-08-01 00:00:32.658504  5535c560ece9415f8f6ad996f1c23f6e        power2 -0.006916
2017-08-01 00:00:32.659504  5535c560ece9415f8f6ad996f1c23f6e        power1  0.001549

where the index is DatatimeIndex. I am using dfply version 0.2.4.

sharpe5 commented 6 years ago

Can you try updating to the latest version of dfply which is 0.3.1?

It's had a major refactor for the better, and this error might have disappeared during this process.

kieferk commented 6 years ago

@Make42 Yes as @sharpe5 says please try this code out on the new v0.3.x of the package. It has a lot of considerable improvements and bug fixes that makes it more robust. Unfortunately I can't replicate your error without the data you're using, but if this issue persists in the new version I will look into it.

Make42 commented 6 years ago

I tested it. Issue is resolved. Even some other issues I just encountered during tests are resolved with 0.3.1. Thank you very much!