holoviz / hvplot

A high-level plotting API for pandas, dask, xarray, and networkx built on HoloViews
https://hvplot.holoviz.org
BSD 3-Clause "New" or "Revised" License
1.08k stars 105 forks source link

Plot of streamz plots all data so far #199

Open martindurant opened 5 years ago

martindurant commented 5 years ago

In the example code, the resultant stream only ever contains two entries, but the hvplot output only adds data, and grows forever

import streamz
s = streamz.dataframe.Random(freq='1s', interval='3s')

tail = s.tail(2)  # tail always has two elements
table = tail.hvplot(kind='table')

This does not happen if the plot occurs on a window object as opposed to a raw dataframe.

Screen Shot 2019-04-21 at 22 42 11

Also notice that the index has gone (it shows correctly as the x-axis in a line plot)

philippjfr commented 5 years ago

I'm very open to suggestions here but this is expected behavior, currently streaming plots have a default backlog set which buffers the last N rows. We should definitely have an option to disable that or even consider turning it off by default, currently I don't think there is any way to display just the latest chunk.

The dropping of the index in the table case is definitely a bug.

martindurant commented 5 years ago

this is expected behavior

I am suggesting this is not right, there is a difference between what the "current value" of a dataframe is, which can be completely new each time, and what you get out of window, which is based on incremental updates. The former should not be using incremental at all, that part is already handled within streamz.

philippjfr commented 5 years ago

I am suggesting this is not right

I do agree with that and I'll consider this as a bug but I'm not yet entirely clear on what to do about it.

By default we can definitely decide to disable the buffer/cache in HoloViews to ensure that it treats each chunk as is. I don't quite see how we can ensure that when a certain window is requested we only send the incremental update and not the whole set of data to the browser. Basically there are currently two places you can accumulate history and I'm wondering whether a) holoviews can know what was specified on the streamz dataframe and b) whether it does make sense for holoviews to still have a separate accumulation buffer just for display.

philippjfr commented 4 years ago

So internally hvPlot uses the _stream_type attribute to determine whether to update the data entirely or to stream new data. This seems to work okay for things like aggregations where the data is updated entirely. But what I don't quite understand about your suggestion is how I would even go about using streamz to keep a buffer of historical values. This is likely just due to my lack of knowledge about streamz but let's say I want to do something very simple, like displaying the last 100 values output by:

from streamz.dataframe import Random
data = Random(interval='200ms', freq='50ms')

How would I go about that using just streamz without using the inbuilt buffering in hvplot/holoviews?

martindurant commented 4 years ago

I’ll get back to you, but honestly, it would be much more useful to me if you could spend time looking into the xrviz slowness within the intake gui. I haven’t tried with the latest releases, perhaps there’s a chance that the problem has gone away? I am totally unequipped to debug what is slowing the browser down (I don’t think it’s on the python side).

On 24 Sep 2019, at 14:09, Philipp Rudiger notifications@github.com wrote:

So internally hvPlot uses the _stream_type attribute to determine whether to update the data entirely or to stream new data. This seems to work okay for things like aggregations where the data is updated entirely. But what I don't quite understand about your suggestion is how I would even go about using streamz to keep a buffer of historical values. This is likely just due to my lack of knowledge about streamz but let's say I want to do something very simple, like displaying the last 100 values output by:

from streamz.dataframe import Random data = Random(interval='200ms', freq='50ms')

How would I go about that using just streamz without using the inbuilt buffering in hvplot/holoviews?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

— Martin Durant martin.durant@utoronto.ca

philippjfr commented 4 years ago

it would be much more useful to me if you could spend time looking into the xrviz slowness within the intake gui.

I've asked Julia to take a look, it's a weird issue because I can't reproduce it in a standalone thing. I'll take another look if Julia can't make sense of it. The only thing I can think of doing is removing parts of the xrviz GUI until it speeds up again which will hopefully tell me what exactly is causing the slowdown.