matz / streem

prototype of stream based programming language
MIT License
4.6k stars 237 forks source link

Stream control structure #16

Open niryuu opened 9 years ago

niryuu commented 9 years ago

As I have seen, one of a key point of Streem is streaming data flows on concurrent situation. The FizzBuzz example shows one simple stream([1..100]->FizzBuzz->STDOUT). But if we write complicated concurrent programs, we tame complicated relations of processes. For example:

Streem has fascinating syntax like UNIX pipe. But it already have function and if statement, and we can implement process control structures using these. Besides this, we can also implement by extending the pipe syntax. So it will be important which control structures to assign to pipe. It will affect usability and expressiveness. How do you think?

matz commented 9 years ago

I think switching and generating processes can be implicit using pipe syntax. At least I'd like to experiment how far we can go without explicit process control.

We will have pipeline operations such as

some of them might have special operators (I have + and & in mind).

oleksandr commented 9 years ago

May be something from Flow-Based Programming (FBP) can be useful here? UNIX pipes are quite limited even together with GNU parallel. Multiple input/output streams and named-ports for blocks would reveal a lot of possibilities. Just a suggestion as I don't know the initial intent of designing a streem language...

matz commented 9 years ago

@oleksandr thank you for the info. I will investigate.

ekg commented 9 years ago

@oleksandr Actually, you can handle multiple input and output streams with shell. You can tee to multiple named pipes, for instance, or pee one input to multiple subshells defined by other commands. Merging is also straightforward, as you "just" need to be able to have one process open and sort/zip/mix a variety of input files (or named pipes).

I'm interested in what streem could bring to the table that's not already well-supported in concurrent, parallel ways in posix-compatible shells (provided a handful of utilities like those found in moreutils and gnu parallel). I'm not saying the situation is ideal, but unix shells like bash and zsh are lacking surprisingly little, and I use them in my work all the time to implement concurrent, multiprocessor stream processing pipelines.

oleksandr commented 9 years ago

@ekg Indeed, this is all possible. What I meant under "limited" is rather inefficient resources usage in certain cases and DX (developer's experience), which in the first turn involves readability & learning curve. Let me elaborate a bit on those.

  1. A simple stream transformation running as a process can be too heavy - running it as a native or green thread would be more appropriate. In UNIX pipes each "block" is executed as a process (involves address space and etc allocation). This is a price to pay for using any executable as a block - some kind of language-agnostic dataflow programming. In other dataflow systems the "blocks" are mapped in different ways. I don't know what kind of async execution for steem @matz has in mind, as it's in progress. But I would also be interested what it could bring to shells.
  2. A relatively complex UNIX pipe is hard to comprehend in comparison to visual representation of a flow or FBP DSL. I believe these kind of systems should exploit visual programming as much as possible. Referring to Bret Victor's talk in 2013 we're still coding instead of direct manipulation of data, we store code in text files instead of spacial representation and we still use sequential programming model instead of thinking in terms of concurrency. If the dataflow/stream approach targets the latter, the first 2 points are not addressed. And to me they are very important for creating a nice DX and providing a moderate learning curve.
bver commented 9 years ago

Not sure if this is the right thread, but: I think Streem could be great for scripting code blocks in a form of simple independent agents running concurrently. IMHO limiting the communication of such agents only to the UNIX pipes model would restrict its potential. We can imagine more messaging patterns here -- and guess zip, mix, cat operators are steps in the right direction. Good inspiration (FBP aside) can be e.g. 0MQ which provides the nice set of communication patterns: http://zguide.zeromq.org/page:all#toc32

oleksandr commented 9 years ago

@bver Handling connections between blocks is the second important aspect after executing a block. In FBP they talk about connection as a bounded buffer with configurable capacity. 0MQ in particular has only HWM notation and queueing depends on the socket type. Would be interesting to see how this can be specified in the steeem DSL.

P.S. Here's an experiment with FBP + 0MQ we were playinh with for the last couple of months: https://github.com/cascades-fbp/cascades - it supports both FBP DSL and JSON format from NoFlo guys.