python-streamz / streamz

Real-time stream processing for python
https://streamz.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.23k stars 145 forks source link

Quickstart lacks conda/environments/streamz_dev.yml #469

Open gdmcbain opened 10 months ago

gdmcbain commented 10 months ago

Having cloned this repo (master at b4f0450586), I failed to run the Quickstart; viz., running

cd ~/src/streamz
sh docker/build.sh

I get

EnvironmentFileNotFound: '/streamz/conda/environments/streamz_dev.yml' file not found.

Looking inside

https://github.com/python-streamz/streamz/blob/b4f0450586f5de40a2cd1232270db7d86fc00176/docker/build.sh#L3

and then

https://github.com/python-streamz/streamz/blob/b4f0450586f5de40a2cd1232270db7d86fc00176/Dockerfile#L19

I'm guessing that's because there's no conda/ subdirectory in the repo?

I see that this conda/ subdirectory is also referred to in the contributing guidelines for this issue-tracker.

martindurant commented 10 months ago

Thank you for having a look at this repository! It has been quite some time since I or anyone can contributed here, unfortunately. I don't remember where the environment file or conda/ dir might have gone.

The requirements are pretty minimal, however, and you should be able to get up and running. There are optional connectors to pandas, dask, hvplot and some more niche things, but you can play without those.

gdmcbain commented 10 months ago

O. K., thanks. I have indeed been able to get up and running. A somewhat stripped back Dockerfile &c. (minus Kafka, conda, wget, …) is in https://github.com/gdmcbain/streamz/tree/469-quickstart. With it, I was able run the first couple of Jupyterlab notebooks I looked at (iterators_and_streamz, fibonacci). Very nice. Thank you!

martindurant commented 10 months ago

I do think that streamz is cool and could be very useful, but it doesn't fit into most people's conception of data processing. Let us know if you do something interesting with it!

gdmcbain commented 10 months ago

What I've got in mind (and thank you, @amotl, for introducing your application in #470, that could be very useful too) is numerical simulation of dynamical systems, something along the lines of

I've been doing this for a while using first itertools then itertoolz, but then exactly as addressed in Why not Python generator expressions? (i.e., the raison d'être of Streamz)

this quickly become cumbersome, especially when building complex pipelines.

Where I got the idea of looking to Streamz was the stream interface of Scikit FiniDiff (which is itself reasonably active and I thnk does use streamz in a fairly integral way).

martindurant commented 10 months ago

Interesting, thank you. Do I understand that you don't use realtime events (from some external stimulus) at all, but push events into the stream? In that case, streamz is providing a handy visualisable branching/ pipelining solution, right?

gdmcbain commented 10 months ago

No realtime stimuli, no, it's all offline simulation. There are sometimes external stimuli. In the language of dynamical systems, some systems are autonomous, which means that they just evolve according to their own internal law; mathematically their differential or difference equation doesn't explicitly involve time, say f(x,dx/dt)=0. The others are nonutonomous, so say f(t,x,dx/dt)=0.

The distinction is a bit blurry because one can always take time t to be just another degree of freedom in x which evolves at constant unit rate, but if the structure of the model is to represent a real physical system with inputs or excitation, the distinction can be meaningful.

So yes, pythonic generators are a pretty good fit but what I'm thinking might be even better is, as you say:

a handy visualisable branching/ pipelining solution