python-streamz / streamz

Real-time stream processing for python
https://streamz.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.23k stars 148 forks source link

Streaming dataframe visualization #126

Open mrocklin opened 6 years ago

mrocklin commented 6 years ago

The streamz.dataframe module could use a good plot method. This has two requirements:

  1. Relatively automatic selection of plot styles
  2. Support for updating results, either a full update of all data or an incremental update of new data / removal of old data

As far as I can tell there are two options here:

  1. Holoviews backed by Bokeh cc @philippjfr
  2. Altair, see https://github.com/altair-viz/altair/issues/435

I believe that both libraries are filling out their capabilities here.

There are two different situations to cover.

  1. Updating plots, where we get a new dataset every update. In Bokeh-parlance this would presumably trigger a column_data_source.update(new_df) call. This corresponds to the streamz.dataframe.core.Frames class
  2. Streaming plots, where we get a few more rows of the dataset. In Bokeh-parlance this would presumably trigger a column_data_source.stream(new_df, n) call. This corresponds to the streamz.dataframe.core.Frame class.
mrocklin commented 6 years ago

The current plot method is entirely special-cased to a time-series x-axis and quantitative y values. It would be nice to generalize this.

philippjfr commented 6 years ago

I definitely think this is a good idea, and shouldn't be too difficult to build something like this on top of HoloViews. HoloViews will automatically switch between cases 1. and 2. depending on what you are plotting, e.g. a Curve plot will use CDS.stream while a box-whisker plot will use CDS.update. If you tell me what types of plot you'd want to expose and narrow down the API a bit I'd be happy to work on this.

I imagine the API would be modeled on the plotting API in pandas itself so maybe it would also make sense to write an extensible plotting interface like what's being discussed for pandas in https://github.com/pandas-dev/pandas/issues/14130. That could be handled in a similar way as what @TomAugspurger has put together in a fork, which allows registering different plotting engines (see here). This would also allow building these components without integrating them with streamz immediately, although in the long run you probably do want to ship a default plotting interface.

mrocklin commented 6 years ago

If you tell me what types of plot you'd want to expose and narrow down the API a bit I'd be happy to work on this.

Honestly I may not know enough to produce a sensible API here and would appreciate any collaboration. The current pandas.DataFrame.plot API seems sensible to me. Perhaps those options translate nicely to Holoviews' chart types?

philippjfr commented 6 years ago

The pandas plot API is generally very matplotlib centric because that's what it was based on, but I think it's definitely a good starting point. Since the pandas issue about allowing different plotting engines there was a push to start writing a bokeh implementation (which hasn't gotten very far) and I spent one evening writing a HoloViews based prototype (note it's very unpolished). Writing a similar prototype for streamz shouldn't be too difficult and might be a fun weekend project for myself, but I can't promise how soon I'd get to it.

mrocklin commented 6 years ago

The pandas plot API is generally very matplotlib centric because that's what it was based on, but I think it's definitely a good starting point.

Yeah, from my perspective as a user this API does fit my brain well

image

Writing a similar prototype for streamz shouldn't be too difficult and might be a fun weekend project for myself, but I can't promise how soon I'd get to it.

Well, not to rush you, but if that weekend happened to be this weekend then this would get shown off in a talk at PyData NYC.

philippjfr commented 6 years ago

Well, not to rush you, but if that weekend happened to be this weekend then this would get shown off in a talk at PyData NYC.

Ha, well that's a good incentive and producing something demo-able without exposing all the different options for styling and so on shouldn't be too much effort. I'll let you know. What form should the prototype take, a PR or should I simply monkeypatch a Plot class onto StreamingDataFrame.plot for now?

mrocklin commented 6 years ago

If you're game then I would probably submit a PR adding a holoviews.py file into streamz/dataframe/, add a plot method to Frame (streaming) and Frames (updating) respectively.

Note that the dataframe code has undergone a fair bit of churn since you looked at it last.

mrocklin commented 6 years ago

I guess you're right that this would have to be a plot object to get the same pandas API

mrocklin commented 6 years ago

An interesting alternative for the updating case is just to use pandas.DataFrame.plot and render that image to the notebook cell. It's not terribly efficient but isn't that bad either.

psychemedia commented 4 years ago

I was digging through some old code where I was trying to unpick a related Stack Overflow question and found this example of streaming from a dataframe into holowviews:

from tornado import gen
from tornado.ioloop import PeriodicCallback

from holoviews.streams import Buffer

import holoviews as hv
hv.extension('bokeh')

import numpy as np
import pandas as pd

df = pd.DataFrame({'x':range(1000), 'y':np.sin(range(1000))})

rowcount = 0
maxrows = 1000

dfbuffer = Buffer(np.zeros((0, 2)), length=20)

@gen.coroutine
def g():
    global rowcount
    item = df[['x','y']].iloc[rowcount].values.tolist()
    dfbuffer.send(np.array([item]))
    rowcount += 1

    if rowcount>=maxrows:
        cbdf.stop()

#How can we get the thing to stop?

cbdf = PeriodicCallback(g, 500)
cbdf.start()
hv.DynamicMap(hv.Curve, streams=[dfbuffer]).opts(padding=0.1, width=600, color = 'green',)

I donlt think I did ever work out how to create a streamz streaming dataframe to connect to and stream into the chart though?

majidam20 commented 4 years ago

@philippjfr Hi, i would like to show my stream data in holoviews table, also i am using callback function and write below codes: self.buffer7.send(np.array([[scored.index[-1], scored.iloc[:, 5].mean()]]))# table

dmap7=hv.DynamicMap(hv.Table, streams=[op.buffer7])

pn.panel(dmap7).show()

you know i can plot data well, but the tabel's data does not show in web. could you please tell me what is the problem. thank you

philippjfr commented 4 years ago

Thanks for your question @majidam20. I'd recommend filing a post on our Discourse describing in detail what you tried and what exactly isn't working.