Open gdmcbain opened 1 year ago
Thank you for having a look at this repository! It has been quite some time since I or anyone can contributed here, unfortunately. I don't remember where the environment file or conda/ dir might have gone.
The requirements are pretty minimal, however, and you should be able to get up and running. There are optional connectors to pandas, dask, hvplot and some more niche things, but you can play without those.
O. K., thanks. I have indeed been able to get up and running. A somewhat stripped back Dockerfile &c. (minus Kafka, conda, wget, …) is in https://github.com/gdmcbain/streamz/tree/469-quickstart. With it, I was able run the first couple of Jupyterlab notebooks I looked at (iterators_and_streamz
, fibonacci
). Very nice. Thank you!
I do think that streamz is cool and could be very useful, but it doesn't fit into most people's conception of data processing. Let us know if you do something interesting with it!
What I've got in mind (and thank you, @amotl, for introducing your application in #470, that could be very useful too) is numerical simulation of dynamical systems, something along the lines of
I've been doing this for a while using first itertools
then itertoolz
, but then exactly as addressed in Why not Python generator expressions? (i.e., the raison d'être of Streamz)
this quickly become cumbersome, especially when building complex pipelines.
Where I got the idea of looking to Streamz was the stream interface of Scikit FiniDiff (which is itself reasonably active and I thnk does use streamz in a fairly integral way).
Interesting, thank you. Do I understand that you don't use realtime events (from some external stimulus) at all, but push events into the stream? In that case, streamz is providing a handy visualisable branching/ pipelining solution, right?
No realtime stimuli, no, it's all offline simulation. There are sometimes external stimuli. In the language of dynamical systems, some systems are autonomous, which means that they just evolve according to their own internal law; mathematically their differential or difference equation doesn't explicitly involve time, say f(x,dx/dt)=0. The others are nonutonomous, so say f(t,x,dx/dt)=0.
The distinction is a bit blurry because one can always take time t to be just another degree of freedom in x which evolves at constant unit rate, but if the structure of the model is to represent a real physical system with inputs or excitation, the distinction can be meaningful.
So yes, pythonic generators are a pretty good fit but what I'm thinking might be even better is, as you say:
a handy visualisable branching/ pipelining solution
Having cloned this repo (master at b4f0450586), I failed to run the Quickstart; viz., running
I get
Looking inside
https://github.com/python-streamz/streamz/blob/b4f0450586f5de40a2cd1232270db7d86fc00176/docker/build.sh#L3
and then
https://github.com/python-streamz/streamz/blob/b4f0450586f5de40a2cd1232270db7d86fc00176/Dockerfile#L19
I'm guessing that's because there's no
conda/
subdirectory in the repo?I see that this
conda/
subdirectory is also referred to in the contributing guidelines for this issue-tracker.