Open mrocklin opened 6 years ago
The current plot method is entirely special-cased to a time-series x-axis and quantitative y values. It would be nice to generalize this.
I definitely think this is a good idea, and shouldn't be too difficult to build something like this on top of HoloViews. HoloViews will automatically switch between cases 1. and 2. depending on what you are plotting, e.g. a Curve plot will use CDS.stream
while a box-whisker plot will use CDS.update
. If you tell me what types of plot you'd want to expose and narrow down the API a bit I'd be happy to work on this.
I imagine the API would be modeled on the plotting API in pandas itself so maybe it would also make sense to write an extensible plotting interface like what's being discussed for pandas in https://github.com/pandas-dev/pandas/issues/14130. That could be handled in a similar way as what @TomAugspurger has put together in a fork, which allows registering different plotting engines (see here). This would also allow building these components without integrating them with streamz
immediately, although in the long run you probably do want to ship a default plotting interface.
If you tell me what types of plot you'd want to expose and narrow down the API a bit I'd be happy to work on this.
Honestly I may not know enough to produce a sensible API here and would appreciate any collaboration. The current pandas.DataFrame.plot
API seems sensible to me. Perhaps those options translate nicely to Holoviews' chart types?
The pandas plot
API is generally very matplotlib centric because that's what it was based on, but I think it's definitely a good starting point. Since the pandas issue about allowing different plotting engines there was a push to start writing a bokeh implementation (which hasn't gotten very far) and I spent one evening writing a HoloViews based prototype (note it's very unpolished). Writing a similar prototype for streamz shouldn't be too difficult and might be a fun weekend project for myself, but I can't promise how soon I'd get to it.
The pandas plot API is generally very matplotlib centric because that's what it was based on, but I think it's definitely a good starting point.
Yeah, from my perspective as a user this API does fit my brain well
Writing a similar prototype for streamz shouldn't be too difficult and might be a fun weekend project for myself, but I can't promise how soon I'd get to it.
Well, not to rush you, but if that weekend happened to be this weekend then this would get shown off in a talk at PyData NYC.
Well, not to rush you, but if that weekend happened to be this weekend then this would get shown off in a talk at PyData NYC.
Ha, well that's a good incentive and producing something demo-able without exposing all the different options for styling and so on shouldn't be too much effort. I'll let you know. What form should the prototype take, a PR or should I simply monkeypatch a Plot
class onto StreamingDataFrame.plot
for now?
If you're game then I would probably submit a PR adding a holoviews.py
file into streamz/dataframe/
, add a plot method to Frame
(streaming) and Frames
(updating) respectively.
Note that the dataframe code has undergone a fair bit of churn since you looked at it last.
I guess you're right that this would have to be a plot object to get the same pandas API
An interesting alternative for the updating case is just to use pandas.DataFrame.plot
and render that image to the notebook cell. It's not terribly efficient but isn't that bad either.
I was digging through some old code where I was trying to unpick a related Stack Overflow question and found this example of streaming from a dataframe into holowviews:
from tornado import gen
from tornado.ioloop import PeriodicCallback
from holoviews.streams import Buffer
import holoviews as hv
hv.extension('bokeh')
import numpy as np
import pandas as pd
df = pd.DataFrame({'x':range(1000), 'y':np.sin(range(1000))})
rowcount = 0
maxrows = 1000
dfbuffer = Buffer(np.zeros((0, 2)), length=20)
@gen.coroutine
def g():
global rowcount
item = df[['x','y']].iloc[rowcount].values.tolist()
dfbuffer.send(np.array([item]))
rowcount += 1
if rowcount>=maxrows:
cbdf.stop()
#How can we get the thing to stop?
cbdf = PeriodicCallback(g, 500)
cbdf.start()
hv.DynamicMap(hv.Curve, streams=[dfbuffer]).opts(padding=0.1, width=600, color = 'green',)
I donlt think I did ever work out how to create a streamz
streaming dataframe to connect to and stream into the chart though?
@philippjfr Hi, i would like to show my stream data in holoviews table, also i am using callback function and write below codes: self.buffer7.send(np.array([[scored.index[-1], scored.iloc[:, 5].mean()]]))# table
dmap7=hv.DynamicMap(hv.Table, streams=[op.buffer7])
pn.panel(dmap7).show()
you know i can plot data well, but the tabel's data does not show in web. could you please tell me what is the problem. thank you
Thanks for your question @majidam20. I'd recommend filing a post on our Discourse describing in detail what you tried and what exactly isn't working.
The streamz.dataframe module could use a good
plot
method. This has two requirements:As far as I can tell there are two options here:
I believe that both libraries are filling out their capabilities here.
There are two different situations to cover.
column_data_source.update(new_df)
call. This corresponds to thestreamz.dataframe.core.Frames
classcolumn_data_source.stream(new_df, n)
call. This corresponds to thestreamz.dataframe.core.Frame
class.