gulpjs / vinyl-fs

Vinyl adapter for the file system.
MIT License
971 stars 156 forks source link

Consider using different streams as the basis #146

Closed phated closed 4 years ago

phated commented 8 years ago

After all the trouble we have been dealing with using node streams (error propagation, lost events, etc), I think it might be time to consider using different stream module as the underlying streams for this module. Quite some time ago, https://github.com/caolan/highland was big news, maybe we should review it and see how it handles some of our problems.

yocontra commented 8 years ago

the browser Stream spec has error propagation IIRC

phated commented 8 years ago

@contra yeah, I had been discussing the whatwg stream spec with chris - not sure where in the process it is and if anyone has a solid implementation yet.

phated commented 8 years ago

@contra looks like there is a reference implementation polyfill at https://github.com/whatwg/streams/tree/master/reference-implementation

erikkemperman commented 8 years ago

Not sure if it would help with the trouble you've been seeing -- but @piranna pointed to fstream and that looks nice in that it integrates stat.

mikeal commented 8 years ago

AFAIK the Stream spec's error propogation comes from being based on Promises. Without getting into too much depth, Promise error propogation has... issues.

The initial WHATWG Stream spec was based on a lot of feedback from the Node.js Stream implementation and fixing what we got wrong. Since then it's taken a lot of turns and I'm not sure how many of them were to keep spec people and Promise advocates happy and how many were real improvements.

phated commented 8 years ago

@mikeal the node stream error implementation is much worse than promises and the unhandledRejection handler in new nodes would be a really nice thing for us to hook into. We have been dealing with a ton of issues that I'm trying to work through with @chrisdickinson and he said that WHATWG Streams actually solve some of them (sometimes-writeable/sometimes-transform streams).

chrisdickinson commented 8 years ago

The initial WHATWG Stream spec was based on a lot of feedback from the Node.js Stream implementation and fixing what we got wrong. Since then it's taken a lot of turns and I'm not sure how many of them were to keep spec people and Promise advocates happy and how many were real improvements.

I've been keeping track of the WHATWG stream changes — most of them are well-grounded. The biggest concern I have with them right now is that, last I checked, the byte reader stuff was up in the air, though that may have since changed.

AFAIK the Stream spec's error propogation comes from being based on Promises. Without getting into too much depth, Promise error propogation has... issues.

As someone who was formerly very anti Promise, I'd be very interested in having a conversation with you about this in a separate venue :)

mikeal commented 8 years ago

So, unhandledRejection solves the problem of errors going unhandled but the really hard part about propogation is what you do during splits. This happens in promises and in streams (when streams are piped to multiple outputs) and I've never seen a good solution to this. You either pick a direction to propogate, potentially not sending to the side that has a handler, or you potentially hit two handlers on each side of the split. We've been trying to figure this out since the first standardized Stream implementation landed in Node.js and haven't found a more reliable pattern than requiring people to handle errors in the stream that generates it.

mikeal commented 8 years ago

As someone who was formerly very anti Promise, I'd be very interested in having a conversation with you about this in a separate venue :)

Sure :)

For the record, I don't really care about Promises. I don't hate them, and I'm not in love with them, I think they are fine for people that prefer them but I know from experience that many people find them a bigger conceptual barrier than callbacks.

However, I do dislike the proliferation of Promises through the specification process. I can't blame the spec authors though, there isn't a standardized way of doing something in the future or even providing callbacks and event handlers and it probably sucks to open up those parts of their spec to endless bike shedding. But this also means that Promises are showing up in all kinds of places they don't need to be, and I mean this not as a matter of preference but on a purely technical basis. It means that Promises are mostly complicating APIs that would otherwise be much simpler because there just isn't an alternative in the spec world while in the real world we have the Node.js standard callback API, EventEmitters, etc.

chrisdickinson commented 8 years ago

@mikeal: Thanks for your response — this is interesting! I'll respond to the Promise part of the conversation on :arrow_right: nodejs/readable-stream#181 so as not to fill up the vinyl-fs tracker.

domenic commented 8 years ago

I of course disagree with all of @mikeal's characterizations here. Promises are being used appropriately, and have no issues with error propagation. But I'm not really interested in debating that, just wanted to counter some of the FUD.

the byte reader stuff was up in the air, though that may have since changed.

It's getting really really nice. Zero-copy everywhere and much less awkward than a separate type. Watch https://github.com/whatwg/streams/pull/418 for the details.

@contra looks like there is a reference implementation polyfill at https://github.com/whatwg/streams/tree/master/reference-implementation

There's a couple issues I'd warn against.

phated commented 8 years ago

@domenic this wouldn't end up in gulp 4 but is forward-looking towards gulp 5. The reference implementation might be fine to prototype against; however, do you know of any polyfill authors that are actively working on one? I remember seeing a bunch of people jump in like a year ago but haven't heard anything since.

domenic commented 8 years ago

@phated I haven't seen any in active maintenance, no. But, forking from the reference implementation and fixing the worst-performing aspects/de-Traceurizing would probably be a few hours work.

calvinmetcalf commented 8 years ago

what about just using pump (or something similar) instead of .pipe as the default method of combining streams in the next version of gulp? It would have the advantage of keeping compatibility with current node streams in use.

mcollina commented 8 years ago

The reason why I use gulp is because it is based around node streams. It is its greater feature for me, because it makes this compatible with the greater node ecosystem. I also use the vynil-fs in some other ways, and I have created awesome things with it.

I know and have experienced all the problems with Streams, including the difficulty in explaining them to newbies. I think we should fix the problems in node streams, because it will benefit the whole node community.

@phated what are the problems you are experiencing? how can we improve the gulp user experience by fixing streams?

phated commented 8 years ago

@mcollina I believe the latest saga is documented through issues and PRs:

We've also been the biggest proponent of better error management in node streams and were pushing for something like @chrisdickinson's original error management PR on node core. We don't want people to have to use gulp-plumber in their pipelines and with the amount of people using it, I think error propagation should probably be the default behavior. We also need some sort of error recovery so the stream doesn't get torn down on error.

dominictarr commented 8 years ago

If you are considering looking at a different stream api, I strongly recommend you have a look at pull-stream

pull-streams can do all the things that node-streams can do, but are dramatically less code. you don't even need a library to implement a pull-stream (it's just a couple of functions). The source and the sink both have a chance to apply back pressure, and errors propagate. If a source errors, that is pretty much like a normal end condition, but if a sink errors that propagates back up and aborts the source.

When you need to interact with the regular node ecosystem, there is stream-to-pull-stream and pull-stream-to-stream modules for interop.

back pressure in pull-streams is much easier to reason about, so you can use use FRP style things like pull.take(5) which reads 5 things from the source and then aborts it, but since it also have does have fully fledged back pressure you can use it for IO too.

darsain commented 8 years ago

@dominictarr I actually played with the idea of adding pull-streams to gulp recently. They are amazing and so simple, but they are a bit too simple :) The issue is that they are not easily identifiable. Gulp checks what is returned from the task, and behaves accordingly, but pull streams are basically just plain functions. Source is a function with 2 arguments, sink is a function with 1 argument, and through can't be checked at all since it appears as a sink. That dives below the line I considered acceptable when duck typing.

Wouldn't you consider adding a flag to the spec? Something like stream._isPullStream = true? And maybe another one with the type of the stream? Or maybe combine them into one truthy flag stream._isPullStream = 'sink', although that feels weird.

Anyway, my point is, pull streams are a type of async interface that just needs to be reliably identifiable, otherwise tools like gulp can't really use them.

dominictarr commented 8 years ago

thank you! hmm, I actually had that in early version of pull-stream, (back when there was a .pipe() function, but that is removed now).

Since pull-streams are just a function, and you don't have to inherit from anything, or depend on any specific module to create a pull-stream, it would mean that every pull-stream module would need to a flag. That is not a reasonable change.

On the other hand, if gulp adopted a convention for accepting pull-streams which gulp modules could follow, which could be returning a pull-stream with a flag for gulp to see, that would be gulp's business.

If you do that it shouldn't start with _ because that conventionally means "private" but or "internal" but I'd argue it's a public api.

ivan-kleshnin commented 7 years ago

Pull-streams are too good for committee people. No classes, composable API, code is unnecessary readable... boring. Just a historical record after reading this crap.

phated commented 4 years ago

With the awesome work on https://github.com/mafintosh/streamx - we will be switching to that project for our streams implementation. It fixes a lot of our issues and aims to be compatible with node core streams. I'm going to open a new issue for the switch.