Closed chrisdickinson closed 7 years ago
Well... can we blame them? ;-) ... I'm not too familiar with the issues they've been having, is there a list or summary somewhere that outlines the main issues?
Perhaps @phated or one of the others could weigh in here?
Some things we have been dealing with are: error management, having a stream that is sometimes-writeable/sometimes-transform depending on how it is being used, unpipe management. Chris is well-versed in the issues because I need his help untangling the problems much of the time.
Yeah, I've seen similar problems elsewhere. Definitely should get this fixed once and for all... streams lead to... sadness...
Node.js Steams are the worst except for all other streaming implementations :(
The big problems I see with the state of streams as it stands are that:
.destroy
, .flush
, and 'close'
are variously implemented between subclasses of streams and concrete implementations of streams, like net.Socket
..read / 'readable'
, or may explode with 'data'
, etc.This list is non-exhaustive, but I think we can make definite progress if we:
@mikeal: (obligatory: WHATWG streams have some neat solutions to the above — for example, while the ReadableStreamReader
feels a bit Java-y at first glance, it ends up being a nice way to ensure that only one public API is exposed to the world at a time.)
From this comment:
So, unhandledRejection solves the problem of errors going unhandled but the really hard part about propogation is what you do during splits. This happens in promises and in streams (when streams are piped to multiple outputs) and I've never seen a good solution to this. You either pick a direction to propogate, potentially not sending to the side that has a handler, or you potentially hit two handlers on each side of the split. We've been trying to figure this out since the first standardized Stream implementation landed in Node.js and haven't found a more reliable pattern than requiring people to handle errors in the stream that generates it.
I think Promises (and to a lesser degree, EventEmitter
) show you can punt on this sort of issue and still end up with a workable API. Promise splitting is the same sort of problem as callback splitting — that is, the problem of creating a sort of "spur route" in the dependence graph of events. The most actionable advice I've seen is to avoid creating those spur routes at all cost, and when they come up, to explicitly tie them back to the main dependence graph somehow so that their completion is observable. It's a stumbling block, to be sure, but it's one that comes up all the time in async programming, promises or no.
In that respect, unhandledRejection
fills the same sort of gap that EventEmitter
's "too many listeners" warning does — where a user is doing something that is probably dangerous, we warn them. I think Streams might be able to get away with the same sort of behavior (if not outright piggyback on unhandledRejection
.)
Regardless of streams
being a problem, especially it's documentation and developer education, I highly doubt that Gulp
is as an important consumer as it used to be.
Being a user of build systems since years, but having dropped any around 01/2015, I have quite some experience with all of them. I chose Gulp
because of streams.
My point being is, that is largely a problem of implementation. I tried to hack on Gulp
and found it quite convoluted. The development stalled around Q1 2015 significantly. So my insight is that Gulp
falls into the "bad implementation" category. Which still is bad for the API, since it is likely to be blamed. However I wouldn't fall out of the usual improvement cycle.
I guess part of this is we could bring back up the idea of some sort of error forwarding method or forwarding of unhandled error events
A good stream API should be:
At Sage, we are using streams extensively in a large node.js project (~200 klocs not including dependencies). We are not using the native node.js API, though, but the ez-streams API instead.
The ez-streams library/API is based on the following principles:
(*) high level API does not try to shoehorn everything into streams and pipes. Instead, it singles out operations like map
, filter
, transform
, tee
, etc.
reader.read(cb)
: async read. End of stream is signalled by returning undefined
through cb
.reader.stop(arg)
: informs upstream that consumer stopped reading.writer.write(cb, data)
: async write. Writing undefined
ends the writer. Returns an error through the callback if downstream cannot accept the write (for example because it has been prematurely closed).That's all you need to know to implement streams: reader = 2 methods, writer = 1 method.
Notes:
encoding
parameter). Streams can be streams of anything. Only undefined
is reserved.The decorator transforms an implementation of the simple core API into a stream with a rich API. This API is a natural extension of the JS array API.
A typical stream chain looks like:
reader.transform(parser).filter(filter).map(mapper).transform(formatter).pipe(cb, writer);
The high level API provides two types of methods:
transform
, filter
, map
, skip
, limit
, while
, until
, tee
, ...pipe
, forEach
, reduce
, every
, some
, ...(*) or a writer into another writer, with a simple API trick (the pre
property).
Evaluation is always lazy (nothing happens until a reducer starts to pull from a chain).
Error handling is simple, robust and systematic: all reducers take a continuation callback parameter. Any uncaught error occurring in a chain ends up in the reducer's callback at the end of the chain.
The reducer callback also allows you to reliably handle the end of a chain, and for example, ensure that writers have been flushed before continuing.
The monadic methods (transform
, filter
, map
above) don't take a continuation callback but the functions that you pass to them (parser
, filter
, mapper
, formatter
above) do take one. So you can use asynchronous APIs in all steps of a chain. For example, you can query a database in a filter
or mapper
function.
A reader chain can be forked. In this case all emerging chains must be terminated by a reducer. The branching methods (tee
, dup
) let you control how errors are propagated. For example, the API allows you to tee to a spy and close the spy without impacting processing in the main chain.
ez-streams sits side by side with node.js streams and provides a simple API to transform back and forth:
ezReader = ezNode.reader(nodeReadable)
.nodeReadable = ezReader.nodify()
.Same for writers. There is also a method (nodeTransform
) to inject a duplex stream into a chain.
This is safe. Node streams are not hacked.
A better API is possible.
Hacking around the current stream API/implementation won't solve the problem. It will just make things more complex. What we need is a fresh API which coexists with the current API and can be transformed back and forth.
The new API should be continuation-based and high-level/monadic rather than event-driven and low-level.
ez-streams can be a source of inspiration. I'm not expecting it to be taken verbatim (it could be easily adapted for promises) but I think that the design principles should be taken into consideration in a future iteration on a streaming API for node.
I know that I'm probably being loud with this as I've already tried to influence the stream API design several times, without much success. I'm posting again because the problems are obviously still around (otherwise this thread would not exist) while, on the other hand, we have been very successful with the ez-streams approach at Sage. Maybe the time has come...
errare humanum est, perseverare diabolicum
@eljefedelrodeodeljefe Gulp has been downloaded around 1.5 million times in the last month — it is very much an important consumer of streams. In the course of building an ecosystem around Gulp they've uncovered a lot of edge cases in Node's streams implementation and have always been forthcoming about getting those bugs fixed in core where possible. We should trust that the pain points they've highlighted are valid, not accuse them of bad implementation (remember: they're watching this thread, and were kind enough to bring this switch to our attention.)
@bjouhier While ez-stream has nice properties, suggesting an entirely different stream system is a non-starter. We need to figure out where we're at and take concrete steps to improve streams within the existing Node API.
@calvinmetcalf I think that would be an excellent idea. IIRC, I think @mcollina also proposed standardizing .destroy
in streams — these would both be excellent improvements, though I'd like to make sure we thoroughly document how we expect these features to work with existing streams of various versions and types before accepting an implementation.
Error handling is consistently surprising for users.
For the record, I created a ticket about this. https://github.com/nodejs/node/issues/4491
I agree on @chrisdickinson in https://github.com/nodejs/readable-stream/issues/181#issuecomment-174750994.
I think we should list the issues and kick off a stream-wg discussion around those.
What am I seeing are:
1) difficulty of implementation: streams are advanced stuff right now, while they should be simple/basic
2) error propagation: pipe does the wrong thing, and pump is needed most of the time
3) missing destroy()
, and false safety in respect to memory or file descriptors leak.
4) relying too much on userland for core operations without telling users (see pump, duplexify, reduplexer, through2, from2, ....)
The situation is currently a mess, but a mess that is working extremely well in production. The goal should be to reduce the friction and lower the steep learning curve of streams.
Anything else?
@mcollina I think you're spot on.
My additions would be:
nodejs/readable-stream
vs nodejs/node
)I think particularly in terms of documentation improvements could be made. @chrisdickinson has done great work in streams notes and streams charts; I think he was onto something there.
I agree with @mcollina @yoshuawuyts. Having a .destroy()
method and a ._flush(cb)
on writable streams would be great. At this point its kinda hard to change a lot of other things without breaking a lot of stuff but we could look into fixing .pipe's behaviour in regards to error handling as well. @chrisdickinson what about a WG meeting to discuss this?
:+1: for having a meeting. But I think we should set an aged to discuss all points first.
IMHO: changing pipe behavior, destroy, _flush and docs are all important topics that we need to address.
I am not much aware of the Node.js stream semantics. So ignore if this is a pretty noob question.
Does our implementation adhere to a specification? Promises had similar problems and when A+ came in the dust kinda settled down a bit. Similarly streams also have multiple specifications. The one that I know of is reactive streams.
@thefourtheye there are really no promise A+ style thing and to complicate matters when people talk about 'streams' they talk about 2 different things
reactive streams often really mean the first one and node streams are the second kind.
reactive streams often really mean the first one and node streams are the second kind.
Isn't the other way around?
Just jumping in as someone who is new to but deep diving this stuff. I'm happy to lend a hand improving the documentation story around streams as improvements come. It's not a trivial subject for new comers and the docs are lacking around them, exacerbated by what @mcolina says about userlad dependencies for potentially core features.
@thefourtheye why don't you open up a new issue to discuss the differences between streams, we probably don't want to add to much extra noise to this one.
@chrisdickinson my point was here, that we just shouldn't get too nervous that a potentially stalling, while merited repo has some problems. However I think they and others have issues @mcollina described as streams
being unnecessary advanced. I believe this to be a thing of computer science history and education though, which reminds me of the following Star Trek
dialogue, between the engineer and the inventor. Because streams
are as awesome and usable as is the warp drive
.
LAFORGE: Doctor! COCHRANE: Yeah. LAFORGE: Would you mind taking a look at this? COCHRANE: Yeah. LAFORGE: I've tried to reconstruct the
intermix chamberstream as control flow thingy from what I remember at school. Tell me if I got it right. COCHRANE: School? You learned about this in school? LAFORGE: Oh yeah.'Basic Warp Design'asynchronous control flow design with streams is a required course at the Academy. The first chapter is called'Zefram Cochrane'Node.js. COCHRANE: Well, it looks like you got it right. BARCLAY: Commander. This is what we're thinking of using to replace thedamagedancientwarp plasma conduitprocedural software at hand. (Geordi uses his ocular implant to examineathespiral of copper tubingasync traces) LAFORGE: Yeah. Yeah, that's good, but you need toreinforce this copper tubing with a nano-polymeruse streams even more at line 50.
I'm :+1: on a meeting, discussing next steps in terms of WG work would be super useful. However, I'd like us to avoid deciding on a technical direction in-meeting until we have a concrete proposal in some form or another. Maybe figuring out what form that takes can be part of the agenda?
I'm down with that
On Sat, Jan 30, 2016 at 2:30 AM Chris Dickinson notifications@github.com wrote:
I'm [image: :+1:] on a meeting, discussing next steps in terms of WG work would be super useful. However, I'd like us to avoid deciding on a technical direction in-meeting until we have a concrete proposal in some form or another. Maybe figuring out what form that takes can be part of the agenda?
— Reply to this email directly or view it on GitHub https://github.com/nodejs/readable-stream/issues/181#issuecomment-177093786 .
I think it may be useful to not solve everything here. Solve flowing data in a pipeable way (with optional backpressure); try to avoid catering to very many edge-cases.
Just adding a little point: https://github.com/mcollina/split2/pull/8#issuecomment-180734274
@stevemao whatwg streams !== node streams; node streams haven't been specced by a committee - they have an implementation and documentation to support it managed by the streams working group
@yoshuawuyts @stevemao just wanna reiterate that the issue linked isn't really streams error handling related. more generic error handling.
@yoshuawuyts I realised it so I removed my comment.
The reason I linked that is because it is relevant on how we need to fix streams. We should really kick off a meeting. How about the week between 22/02 and 26/02?
Fundamentally, since it is user-facing, documentation is specification. For node streams, it seems sorely wanting for clarity.
I am a newbie; however, the way Stream.pipe is used suggest streams are FIFO queues. Yet the documentation has notions of writable, readable, and duplex Streams, as though a stream is either one or the other. Clearly, pipe makes it all of them.
Perhaps the first steps to cleaning this up should be an elaboration of the unit tests to further crystallize known functionality and patterns of use as a basis for user land documentation.
I have become increasingly aware that things are hard only because the documentation is bad. The documentation is bad when the implementation is bad. The implementation is bad when the design is bad. The design is bad when the specification is bad. The specification is usually bad for most of the pieces in versions 0 and.1, reducing onwards, not surprisingly. It would seem it's time for streams.
@matteocollina the week of Feb 22 seems good we should make another issue with an agenda and precise timing
@gdennie streams may be readable, writable, or both (duplex) , some streams that are both readable and writable do act as fifo queues and we call those transform streams.
@calvinmetcalf It seems writable streams are also readable streams since all streams implement pipe.
Hi @gdennie, I personally have a lot of problems working with stream. So we need to improve it for sure.
I have become increasingly aware that things are hard only because the documentation is bad. The documentation is bad when the implementation is bad. The implementation is bad when the design is bad. The design is bad when the specification is bad. The specification is usually bad for most of the pieces in versions 0 and.1, reducing onwards, not surprisingly. It would seem it's time for streams.
I don't think this tells too much what we should do to improve it but it sounds like a complaint to me. If you could help and give more constructive suggestions that would be great :) Thanks.
@gdennie writable streams do NOT have a pipe method and are not readable, if you read documentation that suggests or implies the other case it's wrong and we can fix it if you can point us to it.
One good thing came out of this https://www.npmjs.com/package/cloneable-readable. I'm closing, but feel free to reopen if it's still a relevant topic.
It might be worthwhile to look into the conversation going on here — Gulp is a pretty heavy user of Node streams, and it looks like we're not serving them well at present. In particular, I'd like to pose the following questions:
/cc @nodejs/streams