holoviz / holoviews

With Holoviews, your data visualizes itself.
https://holoviews.org
BSD 3-Clause "New" or "Revised" License
2.71k stars 403 forks source link

Resample and Collapse not working on DynamicMap NdOverlay (timeseries) #4143

Open rtbs-dev opened 4 years ago

rtbs-dev commented 4 years ago

ALL software version info (this library, plus any other relevant software, e.g. bokeh, python, notebook, OS, browser, etc)

watermark:

CPython 3.7.5
IPython 7.10.2

jupyter 1.0.0
pandas 0.25.3
bokeh 1.4.0
holoviews 1.12.7

compiler   : GCC 7.3.0
system     : Linux
release    : 5.2.13-126.current
machine    : x86_64
processor  : 
CPU cores  : 8
interpreter: 64bit
Git hash   :

Description of expected behavior and the observed behavior

holoviews.operations.timeseries.resample should accept arbitrary functions to operate on underlying data, but currently passing any argument to the function kwarg throws a TypeError:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-ffef6624afdd> in <module>
----> 1 dmap.apply(resample, rule='M', function=pd.np.sum) # TypeError

TypeError: __call__() got multiple values for argument 'function'

Similarly, the holoviews.operations.collapse argument appears to have a bug, though not throwing any exceptions. Instead it only plots the first facet of an NdOverlay, ignoring the summation altogether (see below).

Complete, minimal, self-contained example code that reproduces the issue

# In[1]:
import pandas as pd
from functools import partial
import holoviews as hv
from holoviews import opts
from holoviews.operation.timeseries import resample
from holoviews.operation import collapse
hv.extension('bokeh', 'matplotlib')

# In[2]:
tags = ['tagA','tagB']
nobserv = 500

data = pd.np.random.binomial(
    1, .5, # trials, prob 
    size=(nobserv, len(tags))
)

index = pd.date_range(
    end=pd.Timestamp.today(), 
    periods=nobserv, 
    freq='D',
    name='timestamp'
)

df = pd.DataFrame(data, columns=tags, index=index)
df.head()

# In[3]:
options = [
    opts.Curve( width=600, height=200)
]

def tag_timeseries(df, tag):
    return hv.Curve(df[tag], name=tag)

dmap = (
    hv.DynamicMap(
        partial(tag_timeseries, df), 
        kdims='tag'
    )
    .redim.values(tag=tags)
    .overlay('tag')
    .opts(options)
)

resamp_dmap = dmap.apply(resample, rule='M')  # works
resamp_dmap

# In[4]:
dmap.apply(resample, rule='M', function=pd.np.sum) # TypeError

# In[5]:
resamp_dmap.apply(collapse).opts(options) # only returns original tagA curve
philippjfr commented 4 years ago

I think the problem here is apply uses the function keyword internally, so it clashes with resample's function.

rtbs-dev commented 4 years ago

@philippjfr I'm not sure if that is directly related (unless apply is being called automatically in other situations):

For instance, this works (i.e. manually creating the NdOverlay):

resample(
    tag_timeseries(df, 'tagA')*\
    tag_timeseries(df, 'tagB'), 
    rule='M'  # works
)

and this throws the same error

resample(
    tag_timeseries(df, 'tagA')*\
    tag_timeseries(df, 'tagB'), 
    rule='M', 
    function=pd.np.sum  # TypeError
)
rtbs-dev commented 4 years ago

In the same vein, calling collapse on the above does (seemingly) nothing, just returning hv.Curve.tagA:

collapse(resample(
    tag_timeseries(df, 'tagA')*\
    tag_timeseries(df, 'tagB'), 
    rule='M', 
#     function=pd.np.sum  # TypeError
), fn=pd.np.sum)

I would also note that it's somewhat confusing to have separately named arguments (function vs. fn) that both are used in operations to apply functions to elements via operations