holoviz / holoviews

With Holoviews, your data visualizes itself.
BSD 3-Clause "New" or "Revised" License
2.71k stars 403 forks source link

Resample and Collapse not working on DynamicMap NdOverlay (timeseries) #4143

Open rtbs-dev opened 4 years ago

rtbs-dev commented 4 years ago

ALL software version info (this library, plus any other relevant software, e.g. bokeh, python, notebook, OS, browser, etc)


CPython 3.7.5
IPython 7.10.2

jupyter 1.0.0
pandas 0.25.3
bokeh 1.4.0
holoviews 1.12.7

compiler   : GCC 7.3.0
system     : Linux
release    : 5.2.13-126.current
machine    : x86_64
processor  : 
CPU cores  : 8
interpreter: 64bit
Git hash   :

Description of expected behavior and the observed behavior

holoviews.operations.timeseries.resample should accept arbitrary functions to operate on underlying data, but currently passing any argument to the function kwarg throws a TypeError:

TypeError                                 Traceback (most recent call last)
<ipython-input-4-ffef6624afdd> in <module>
----> 1 dmap.apply(resample, rule='M', function=pd.np.sum) # TypeError

TypeError: __call__() got multiple values for argument 'function'

Similarly, the holoviews.operations.collapse argument appears to have a bug, though not throwing any exceptions. Instead it only plots the first facet of an NdOverlay, ignoring the summation altogether (see below).

Complete, minimal, self-contained example code that reproduces the issue

# In[1]:
import pandas as pd
from functools import partial
import holoviews as hv
from holoviews import opts
from holoviews.operation.timeseries import resample
from holoviews.operation import collapse
hv.extension('bokeh', 'matplotlib')

# In[2]:
tags = ['tagA','tagB']
nobserv = 500

data = pd.np.random.binomial(
    1, .5, # trials, prob 
    size=(nobserv, len(tags))

index = pd.date_range(

df = pd.DataFrame(data, columns=tags, index=index)

# In[3]:
options = [
    opts.Curve( width=600, height=200)

def tag_timeseries(df, tag):
    return hv.Curve(df[tag], name=tag)

dmap = (
        partial(tag_timeseries, df), 

resamp_dmap = dmap.apply(resample, rule='M')  # works

# In[4]:
dmap.apply(resample, rule='M', function=pd.np.sum) # TypeError

# In[5]:
resamp_dmap.apply(collapse).opts(options) # only returns original tagA curve
philippjfr commented 4 years ago

I think the problem here is apply uses the function keyword internally, so it clashes with resample's function.

rtbs-dev commented 4 years ago

@philippjfr I'm not sure if that is directly related (unless apply is being called automatically in other situations):

For instance, this works (i.e. manually creating the NdOverlay):

    tag_timeseries(df, 'tagA')*\
    tag_timeseries(df, 'tagB'), 
    rule='M'  # works

and this throws the same error

    tag_timeseries(df, 'tagA')*\
    tag_timeseries(df, 'tagB'), 
    function=pd.np.sum  # TypeError
rtbs-dev commented 4 years ago

In the same vein, calling collapse on the above does (seemingly) nothing, just returning hv.Curve.tagA:

    tag_timeseries(df, 'tagA')*\
    tag_timeseries(df, 'tagB'), 
#     function=pd.np.sum  # TypeError
), fn=pd.np.sum)

I would also note that it's somewhat confusing to have separately named arguments (function vs. fn) that both are used in operations to apply functions to elements via operations