python-streamz / streamz

Real-time stream processing for python
https://streamz.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.23k stars 147 forks source link

Is streamz not maintained anymore? What happened to cuStreamz? #476

Open Daniel451 opened 3 months ago

Daniel451 commented 3 months ago

As the title suggests I would be interested in knowing why there is no active development happening anymore? Is streamz deprecated? Not maintained anymore?

Also, does anyone know what happened to cuStreamz? (NVIDIA's GPU-accelerated streamz fork within their rapids.ai framework)

bparbhu commented 3 months ago

Yeah, I'm also wondering about this. I'd be happy to help with this, I actually like the framework very much.

bparbhu commented 3 months ago

Let's update it then I guess.

martindurant commented 3 months ago

Hi! maintainer here. Thanks for the interest. Indeed, there has not been much activity at all in this project. I try to answer usage questions and fix anything broken, and I can make releases. However, there just hasn't been demand for new development. The niche streamz tried to fill seems to be one that few want (not that I have tried marketing).

@rjzamora , who at nvidia can speak to the status of cuStreamz?

rjzamora commented 3 months ago

I think custreamz is still maintained within the cudf repo, but I don’t think the project is very active. @jdye64 can probably provide a more accurate status :)

MarcSkovMadsen commented 1 month ago

Hi. Panel and HoloViz ecosystem maintainer here.

I'm on constant doubt on whether I/ we should still be recommending/ promoting examples with Streamz library in our docs. For example here https://panel.holoviz.org/reference/panes/Streamz.html and here https://holoviews.org/user_guide/Streaming_Data.html.

When trying to use Streamz for my work related use cases I find it hard as its not a very active library. And most of the functionality (if not all?) of Streamz can now be replaced by Param Reactive Expressions and Param Generators for which I can find support in the community.

Any thoughts here? Would it be better for the Python ecosystem in general to continue using/ recommending Streamz? Or is it a project that should not be used any more?


In https://github.com/holoviz/panel/pull/6980 I suggest recommending to use Panel/ Param native features instead of Streamz. But that is my personal recommendation based on my experience using Streamz.

image

Daniel451 commented 1 month ago

When trying to use Streamz for my work related use cases I find it hard as its not a very active library.

Yes.

And most of the functionality (if not all?) of Streamz can now be replaced by Param Reactive Expressions and Param Generators for which I can find support in the community.

Is that so? I do not want to go off-topic here, thus I will put the code in spoiler-boxes but I think Param only covers a subset of Streamz. I did not test the following code, might very well contain bugs (just quickly drafted something that came to mind where the two libraries would be similar).

If we were to build some reactive updating mechanism to dynamically change a parameter in a visualization task, then an implementation using param could look somewhat like this:

import param import panel as pn import matplotlib.pyplot as plt class ReactivePlot(param.Parameterized): frequency = param.Number(1, bounds=(0.1, 10)) @param.depends('frequency') def view(self): fig, ax = plt.subplots() t = np.linspace(0, 1, 100) y = np.sin(2 * np.pi * self.frequency * t) ax.plot(t, y) return fig reactive_plot = ReactivePlot() app = pn.Column(reactive_plot.param, reactive_plot.view) app.show()

Whereas one might achieve something similar with streamz somewhat like this:

import numpy as np import matplotlib.pyplot as plt from streamz import Stream import panel as pn source = Stream() def update_plot(frequency): fig, ax = plt.subplots() t = np.linspace(0, 1, 100) y = np.sin(2 * np.pi * frequency * t) ax.plot(t, y) plt.close(fig) return fig plot_stream = source.map(update_plot) def set_frequency(event): source.emit(event.new) slider = pn.widgets.FloatSlider(name='Frequency', start=0.1, end=10, value=1) slider.param.watch(set_frequency, 'value') pn.panel(plot_stream.latest()).servable() pn.Row(slider).servable() pn.serve()

In both cases, the crucial behavior to achieve would be having "something dynamic that gets updated throughout the process". This could be dynamic/reactive parameters using param or simply values that can flow at will through a pipeline that trigger a callback.

However, at least in my understanding @MarcSkovMadsen, both libraries also differ a lot. param mostly focuses on (dynamic) parameters and even specifying "types" (like "this is a number; interval is ..."). streamz does not specify or type anything. You can certainly filter or guard inputs or outputs at any moment within the stream but those would be functions (just like everything else) to implement within the stream.

In my opinion, param focuses on the actual data, types, ... and their behavior whereas streamz focus is not the data "running through the pipes" but rather having an easy and functional framework to build the "pipes" together to one Stream (computational graph, network, ... whatever you like to call it). What actually flows and happens inside the stream, is solely up to the functions that were chained.

Moreover, I lack to see where reactive behavior, parameter validation, lazy evaluation, ... would be available in streamz and, the other way around, I do not see where param would offer the ability for flow control, splitting and joining streams of data, batching & collecting, parallelization, ...

martindurant commented 1 month ago

I think @Daniel451 's outline comparison of the libraries if fair. (@philippjfr surely has thoughts on whether param can be a complete replacement, with long experience with both)

As far as this library goes, indeed there has been no development, and there are no plans, as I said above. It always seemed like a great idea, but without quite enough of a demonstrative use case to get users interested. If your end use case is really connecting with panel or other viz/frontend, maybe param makes a lot of sense. If you actually want to do general purpose async event management with complex branching/aggregation logic in python, maybe you would end up writing something like streamz specialised to the particular use case.

MarcSkovMadsen commented 1 month ago

Thx for taking the time @Daniel451. I also think your comparison is very fair.

For completenes I was thinking about and referring to param.rx/ panel.rx. Your example would look something like

`pn.rx` ```python import numpy as np import matplotlib.pyplot as plt import panel as pn source = pn.rx(1) # same as param.rx(1) def update_plot(frequency): fig, ax = plt.subplots() t = np.linspace(0, 1, 100) y = np.sin(2 * np.pi * frequency * t) ax.plot(t, y) plt.close(fig) return fig plot_stream = source.rx.pipe(update_plot) def set_frequency(event): source.rx.value = event.new slider = pn.widgets.FloatSlider(name='Frequency', start=0.1, end=10, value=1) slider.param.watch(set_frequency, 'value') pn.Column(plot_stream, slider).servable() ```

Here is an alternative version that is more .rx like

`pn.rx` - more .rx like ```python import numpy as np import matplotlib.pyplot as plt import panel as pn slider = pn.widgets.FloatSlider(name='Frequency', start=0.1, end=10, value=1) frequency = slider.rx() t = np.linspace(0, 1, 100) y = np.sin(2 * np.pi * frequency * t) def update_plot(y): fig, ax = plt.subplots() ax.plot(t, y) plt.close(fig) return fig plot_rx = pn.rx(update_plot) plot_stream = plot_rx(y) # could also have been y.rx.pipe(update_plot) pn.panel(plot_stream).servable() ```

In my view param.rx can play a role very similar to streamz.Stream. Param does not contain all the more advanced functionality for streams that Streamz contain. But is also what in my experience is hard to use and in my experience its often either not needed or is simpler to understand (also for a team) when implemented directly in the project.

What is especially hard for me to understand is whether Streamz provides any kind of performance optimizations built in? For example when displaying rolling windows of a DataFrame and streaming one row at a time. That would be a big advantage of Streamz over DIY via param.rx for specific use cases.

For completenes. Here is a streaming version based on a generator function.

`pn.rx` - streaming generator ```python import numpy as np import matplotlib.pyplot as plt from time import sleep import panel as pn def source(): while True: for i in range(0,10): yield i sleep(0.25) frequency = pn.rx(source) t = np.linspace(0, 1, 100) y = np.sin(2 * np.pi * frequency * t) def update_plot(y): fig, ax = plt.subplots() ax.plot(t, y) plt.close(fig) return fig plot_rx = pn.rx(update_plot) plot_stream = plot_rx(y) # could also have been y.rx.pipe(update_plot) pn.panel(plot_stream).servable() ```
martindurant commented 1 month ago

whether Streamz provides any kind of performance optimizations

The plotting itself is handled by the panel/hv stack, so streamz only passes the data on. Streamz can handle stateful dataframes, dataframe updates and per-row also; and hv allows for the updating of in-browser data one row at a time without replacing previous data. You should get decent performance either way (within what is possible from the jupyter comm/websocket implementation).