nodejs / readable-stream

Node-core streams for userland
https://nodejs.org/api/stream.html
Other
1.03k stars 225 forks source link

Discuss: Gulp looking to move off of Node Streams #181

Closed chrisdickinson closed 7 years ago

chrisdickinson commented 8 years ago

It might be worthwhile to look into the conversation going on here — Gulp is a pretty heavy user of Node streams, and it looks like we're not serving them well at present. In particular, I'd like to pose the following questions:

/cc @nodejs/streams

jasnell commented 8 years ago

Well... can we blame them? ;-) ... I'm not too familiar with the issues they've been having, is there a list or summary somewhere that outlines the main issues?

Perhaps @phated or one of the others could weigh in here?

phated commented 8 years ago

Some things we have been dealing with are: error management, having a stream that is sometimes-writeable/sometimes-transform depending on how it is being used, unpipe management. Chris is well-versed in the issues because I need his help untangling the problems much of the time.

jasnell commented 8 years ago

Yeah, I've seen similar problems elsewhere. Definitely should get this fixed once and for all... streams lead to... sadness...

mikeal commented 8 years ago

Node.js Steams are the worst except for all other streaming implementations :(

chrisdickinson commented 8 years ago

The big problems I see with the state of streams as it stands are that:

  1. The biggest problem: the current behavior of streams is undefined. We don't have a solid model of how streams interact with streams of other versions or origins, so it's very hard to feel comfortable making changes to how they work. This may overlap with @nodejs/documentation work.
  2. There are certain events that are de-facto part of streams by practice, but aren't fully realized in the streams API — for example, .destroy, .flush, and 'close' are variously implemented between subclasses of streams and concrete implementations of streams, like net.Socket.
  3. Error handling is consistently surprising for users.
  4. Flowing mode vs. non-flowing mode is a false dichotomy — there are three modes a stream can be in, with varying degree of flow-i-ness.
    • In general, we expose two different public APIs that are exclusive: streams using pipe will explode when used with .read / 'readable', or may explode with 'data', etc.
  5. The inheritance model lends itself to some confusing code.
  6. Common operations – like nesting streams to form a duplex pipeline — are left out of the public API, and end up re-implemented in userland, where points 1-5 cause them to exhibit varying forms of bad behavior.

This list is non-exhaustive, but I think we can make definite progress if we:

@mikeal: (obligatory: WHATWG streams have some neat solutions to the above — for example, while the ReadableStreamReader feels a bit Java-y at first glance, it ends up being a nice way to ensure that only one public API is exposed to the world at a time.)

chrisdickinson commented 8 years ago

From this comment:

So, unhandledRejection solves the problem of errors going unhandled but the really hard part about propogation is what you do during splits. This happens in promises and in streams (when streams are piped to multiple outputs) and I've never seen a good solution to this. You either pick a direction to propogate, potentially not sending to the side that has a handler, or you potentially hit two handlers on each side of the split. We've been trying to figure this out since the first standardized Stream implementation landed in Node.js and haven't found a more reliable pattern than requiring people to handle errors in the stream that generates it.

I think Promises (and to a lesser degree, EventEmitter) show you can punt on this sort of issue and still end up with a workable API. Promise splitting is the same sort of problem as callback splitting — that is, the problem of creating a sort of "spur route" in the dependence graph of events. The most actionable advice I've seen is to avoid creating those spur routes at all cost, and when they come up, to explicitly tie them back to the main dependence graph somehow so that their completion is observable. It's a stumbling block, to be sure, but it's one that comes up all the time in async programming, promises or no.

In that respect, unhandledRejection fills the same sort of gap that EventEmitter's "too many listeners" warning does — where a user is doing something that is probably dangerous, we warn them. I think Streams might be able to get away with the same sort of behavior (if not outright piggyback on unhandledRejection.)

eljefedelrodeodeljefe commented 8 years ago

Regardless of streams being a problem, especially it's documentation and developer education, I highly doubt that Gulp is as an important consumer as it used to be.

Being a user of build systems since years, but having dropped any around 01/2015, I have quite some experience with all of them. I chose Gulp because of streams.

My point being is, that is largely a problem of implementation. I tried to hack on Gulp and found it quite convoluted. The development stalled around Q1 2015 significantly. So my insight is that Gulp falls into the "bad implementation" category. Which still is bad for the API, since it is likely to be blamed. However I wouldn't fall out of the usual improvement cycle.

calvinmetcalf commented 8 years ago

I guess part of this is we could bring back up the idea of some sort of error forwarding method or forwarding of unhandled error events

bjouhier commented 8 years ago

A good stream API should be:

Our experience with streams

At Sage, we are using streams extensively in a large node.js project (~200 klocs not including dependencies). We are not using the native node.js API, though, but the ez-streams API instead.

The ez-streams library/API is based on the following principles:

(*) high level API does not try to shoehorn everything into streams and pipes. Instead, it singles out operations like map, filter, transform, tee, etc.

Core API

That's all you need to know to implement streams: reader = 2 methods, writer = 1 method.

Notes:

High-level API

The decorator transforms an implementation of the simple core API into a stream with a rich API. This API is a natural extension of the JS array API.

A typical stream chain looks like:

reader.transform(parser).filter(filter).map(mapper).transform(formatter).pipe(cb, writer);

The high level API provides two types of methods:

(*) or a writer into another writer, with a simple API trick (the pre property).

Evaluation is always lazy (nothing happens until a reducer starts to pull from a chain).

Error handling is simple, robust and systematic: all reducers take a continuation callback parameter. Any uncaught error occurring in a chain ends up in the reducer's callback at the end of the chain.

The reducer callback also allows you to reliably handle the end of a chain, and for example, ensure that writers have been flushed before continuing.

The monadic methods (transform, filter, map above) don't take a continuation callback but the functions that you pass to them (parser, filter, mapper, formatter above) do take one. So you can use asynchronous APIs in all steps of a chain. For example, you can query a database in a filter or mapper function.

A reader chain can be forked. In this case all emerging chains must be terminated by a reducer. The branching methods (tee, dup) let you control how errors are propagated. For example, the API allows you to tee to a spy and close the spy without impacting processing in the main chain.

Interop with node.js streams.

ez-streams sits side by side with node.js streams and provides a simple API to transform back and forth:

Same for writers. There is also a method (nodeTransform) to inject a duplex stream into a chain.

This is safe. Node streams are not hacked.

Morale

A better API is possible.

Hacking around the current stream API/implementation won't solve the problem. It will just make things more complex. What we need is a fresh API which coexists with the current API and can be transformed back and forth.

The new API should be continuation-based and high-level/monadic rather than event-driven and low-level.

ez-streams can be a source of inspiration. I'm not expecting it to be taken verbatim (it could be easily adapted for promises) but I think that the design principles should be taken into consideration in a future iteration on a streaming API for node.

I know that I'm probably being loud with this as I've already tried to influence the stream API design several times, without much success. I'm posting again because the problems are obviously still around (otherwise this thread would not exist) while, on the other hand, we have been very successful with the ez-streams approach at Sage. Maybe the time has come...

errare humanum est, perseverare diabolicum

chrisdickinson commented 8 years ago

@eljefedelrodeodeljefe Gulp has been downloaded around 1.5 million times in the last month — it is very much an important consumer of streams. In the course of building an ecosystem around Gulp they've uncovered a lot of edge cases in Node's streams implementation and have always been forthcoming about getting those bugs fixed in core where possible. We should trust that the pain points they've highlighted are valid, not accuse them of bad implementation (remember: they're watching this thread, and were kind enough to bring this switch to our attention.)

@bjouhier While ez-stream has nice properties, suggesting an entirely different stream system is a non-starter. We need to figure out where we're at and take concrete steps to improve streams within the existing Node API.

@calvinmetcalf I think that would be an excellent idea. IIRC, I think @mcollina also proposed standardizing .destroy in streams — these would both be excellent improvements, though I'd like to make sure we thoroughly document how we expect these features to work with existing streams of various versions and types before accepting an implementation.

stevemao commented 8 years ago

Error handling is consistently surprising for users.

For the record, I created a ticket about this. https://github.com/nodejs/node/issues/4491

mcollina commented 8 years ago

I agree on @chrisdickinson in https://github.com/nodejs/readable-stream/issues/181#issuecomment-174750994.

I think we should list the issues and kick off a stream-wg discussion around those.

What am I seeing are:

1) difficulty of implementation: streams are advanced stuff right now, while they should be simple/basic 2) error propagation: pipe does the wrong thing, and pump is needed most of the time 3) missing destroy(), and false safety in respect to memory or file descriptors leak. 4) relying too much on userland for core operations without telling users (see pump, duplexify, reduplexer, through2, from2, ....)

The situation is currently a mess, but a mess that is working extremely well in production. The goal should be to reduce the friction and lower the steep learning curve of streams.

Anything else?

yoshuawuyts commented 8 years ago

@mcollina I think you're spot on.

My additions would be:

I think particularly in terms of documentation improvements could be made. @chrisdickinson has done great work in streams notes and streams charts; I think he was onto something there.

mafintosh commented 8 years ago

I agree with @mcollina @yoshuawuyts. Having a .destroy() method and a ._flush(cb) on writable streams would be great. At this point its kinda hard to change a lot of other things without breaking a lot of stuff but we could look into fixing .pipe's behaviour in regards to error handling as well. @chrisdickinson what about a WG meeting to discuss this?

mcollina commented 8 years ago

:+1: for having a meeting. But I think we should set an aged to discuss all points first.

IMHO: changing pipe behavior, destroy, _flush and docs are all important topics that we need to address.

thefourtheye commented 8 years ago

I am not much aware of the Node.js stream semantics. So ignore if this is a pretty noob question.

Does our implementation adhere to a specification? Promises had similar problems and when A+ came in the dust kinda settled down a bit. Similarly streams also have multiple specifications. The one that I know of is reactive streams.

calvinmetcalf commented 8 years ago

@thefourtheye there are really no promise A+ style thing and to complicate matters when people talk about 'streams' they talk about 2 different things

  1. event streams aka push streams aka observables: these are streams of actions which are being pushed in from an external source, they can't be throttled
  2. pull streams aka streams with back pressure: streams that you pull from, upstream streams can be paused or throttled if downstream streams are slow in their processing.

reactive streams often really mean the first one and node streams are the second kind.

thefourtheye commented 8 years ago

reactive streams often really mean the first one and node streams are the second kind.

Isn't the other way around?

mcdonnelldean commented 8 years ago

Just jumping in as someone who is new to but deep diving this stuff. I'm happy to lend a hand improving the documentation story around streams as improvements come. It's not a trivial subject for new comers and the docs are lacking around them, exacerbated by what @mcolina says about userlad dependencies for potentially core features.

calvinmetcalf commented 8 years ago

@thefourtheye why don't you open up a new issue to discuss the differences between streams, we probably don't want to add to much extra noise to this one.

eljefedelrodeodeljefe commented 8 years ago

@chrisdickinson my point was here, that we just shouldn't get too nervous that a potentially stalling, while merited repo has some problems. However I think they and others have issues @mcollina described as streams being unnecessary advanced. I believe this to be a thing of computer science history and education though, which reminds me of the following Star Trek dialogue, between the engineer and the inventor. Because streams are as awesome and usable as is the warp drive.

LAFORGE: Doctor! COCHRANE: Yeah. LAFORGE: Would you mind taking a look at this? COCHRANE: Yeah. LAFORGE: I've tried to reconstruct the intermix chamber stream as control flow thingy from what I remember at school. Tell me if I got it right. COCHRANE: School? You learned about this in school? LAFORGE: Oh yeah. 'Basic Warp Design' asynchronous control flow design with streams is a required course at the Academy. The first chapter is called 'Zefram Cochrane'Node.js. COCHRANE: Well, it looks like you got it right. BARCLAY: Commander. This is what we're thinking of using to replace the damagedancient warp plasma conduit procedural software at hand. (Geordi uses his ocular implant to examine athe spiral of copper tubingasync traces) LAFORGE: Yeah. Yeah, that's good, but you need to reinforce this copper tubing with a nano-polymer use streams even more at line 50.

chrisdickinson commented 8 years ago

I'm :+1: on a meeting, discussing next steps in terms of WG work would be super useful. However, I'd like us to avoid deciding on a technical direction in-meeting until we have a concrete proposal in some form or another. Maybe figuring out what form that takes can be part of the agenda?

calvinmetcalf commented 8 years ago

I'm down with that

On Sat, Jan 30, 2016 at 2:30 AM Chris Dickinson notifications@github.com wrote:

I'm [image: :+1:] on a meeting, discussing next steps in terms of WG work would be super useful. However, I'd like us to avoid deciding on a technical direction in-meeting until we have a concrete proposal in some form or another. Maybe figuring out what form that takes can be part of the agenda?

— Reply to this email directly or view it on GitHub https://github.com/nodejs/readable-stream/issues/181#issuecomment-177093786 .

Fishrock123 commented 8 years ago

I think it may be useful to not solve everything here. Solve flowing data in a pipeable way (with optional backpressure); try to avoid catering to very many edge-cases.

mcollina commented 8 years ago

Just adding a little point: https://github.com/mcollina/split2/pull/8#issuecomment-180734274

yoshuawuyts commented 8 years ago

@stevemao whatwg streams !== node streams; node streams haven't been specced by a committee - they have an implementation and documentation to support it managed by the streams working group

mafintosh commented 8 years ago

@yoshuawuyts @stevemao just wanna reiterate that the issue linked isn't really streams error handling related. more generic error handling.

stevemao commented 8 years ago

@yoshuawuyts I realised it so I removed my comment.

mcollina commented 8 years ago

The reason I linked that is because it is relevant on how we need to fix streams. We should really kick off a meeting. How about the week between 22/02 and 26/02?

gdennie commented 8 years ago

Fundamentally, since it is user-facing, documentation is specification. For node streams, it seems sorely wanting for clarity.

I am a newbie; however, the way Stream.pipe is used suggest streams are FIFO queues. Yet the documentation has notions of writable, readable, and duplex Streams, as though a stream is either one or the other. Clearly, pipe makes it all of them.

Perhaps the first steps to cleaning this up should be an elaboration of the unit tests to further crystallize known functionality and patterns of use as a basis for user land documentation.

I have become increasingly aware that things are hard only because the documentation is bad. The documentation is bad when the implementation is bad. The implementation is bad when the design is bad. The design is bad when the specification is bad. The specification is usually bad for most of the pieces in versions 0 and.1, reducing onwards, not surprisingly. It would seem it's time for streams.

calvinmetcalf commented 8 years ago

@matteocollina the week of Feb 22 seems good we should make another issue with an agenda and precise timing

@gdennie streams may be readable, writable, or both (duplex) , some streams that are both readable and writable do act as fifo queues and we call those transform streams.

gdennie commented 8 years ago

@calvinmetcalf It seems writable streams are also readable streams since all streams implement pipe.

stevemao commented 8 years ago

Hi @gdennie, I personally have a lot of problems working with stream. So we need to improve it for sure.

I have become increasingly aware that things are hard only because the documentation is bad. The documentation is bad when the implementation is bad. The implementation is bad when the design is bad. The design is bad when the specification is bad. The specification is usually bad for most of the pieces in versions 0 and.1, reducing onwards, not surprisingly. It would seem it's time for streams.

I don't think this tells too much what we should do to improve it but it sounds like a complaint to me. If you could help and give more constructive suggestions that would be great :) Thanks.

calvinmetcalf commented 8 years ago

@gdennie writable streams do NOT have a pipe method and are not readable, if you read documentation that suggests or implies the other case it's wrong and we can fix it if you can point us to it.

mcollina commented 7 years ago

One good thing came out of this https://www.npmjs.com/package/cloneable-readable. I'm closing, but feel free to reopen if it's still a relevant topic.