Simpler stream creation

sonewman commented 9 years ago

I have mentioned this in a couple of other threads before, I just wanted to get some thoughts and opinions really.

My suggestion is that we allow the _ methods, which need to be set on a stream, be passed in as part of the options on instantiation. e.g.

var stream = require('readable-stream')

var readable = new stream.Readable({
  read(n) { }
})

var writable = new stream.Writable({
  write(data, enc, next) { },
  // edit - courtesy @calvinmetcalf
  writev(chunks, next) { }
})

var duplex = new stream.Duplex({
  read(n) { },
  write(data, enc, next) { },
  // edit - courtesy @calvinmetcalf
  writev(chunks, next) { }
})

var transform = new stream.Transform({
  transform(data, enc, next) { },
  // edit - courtesy @calvinmetcalf
  flush(done) { }
})

edit - functions in ES6 for readability

What do you think?

isaacs commented 9 years ago

I like it. Gets us closer to through2 streams as well.

We'd discussed this back in the 0.10 days, but never implemented it.

sonewman commented 9 years ago

It's also similar to how WHATWG streams work, from what I can see from the examples.

Obviously adding to the prototype would be desirable for optimisation.

Anyway thanks for the feedback, I wanted to get some opinions before implementing anything :smiley:

feross commented 9 years ago

This is quite elegant.

On Sun Jan 25 2015 at 12:32:08 PM Sam Newman notifications@github.com wrote:

It's also similar to how WHATWG streams work, from what I can see from the examples.

Obviously adding to the prototype would be desirable for optimisation.

Anyway thanks for the feedback, I wanted to get some opinions before implementing anything [image: :smiley:]

— Reply to this email directly or view it on GitHub https://github.com/iojs/readable-stream/issues/102#issuecomment-71391661 .

calvinmetcalf commented 9 years ago

transform likely needs an optional flush

var transform = new stream.Transform({
  transform: function (data, enc, next) { },
  flush: function (done) { }
})

duplex and writeable need an optional writev

var writable = new stream.Writable({
  write: function (data, enc, next) { },
  writev: function (chunks, next) { }
})

var duplex = new stream.Duplex({
  read: function (n) { },
  write: function (data, enc, next) { },
  writev: function (chunks, next) { }
})

but yes I like this idea

sonewman commented 9 years ago

@calvinmetcalf yes this is very true, thanks for the reminder :smile:

chrisdickinson commented 9 years ago

If we go the route of simpler stream instantiation, I'd like to support the revealing-constructor approach from WHATWG streams:

var readable = new Readable({
  start: (enqueue, ready) => {},
  pull: (enqueue, n, end) => {}
  // ...
});

Adding enqueue (separate from end) would also give us a lever with which to coax new features into streams without affecting .push() – this could be a good mechanism to introduce things like in-alphabet null/undefined, objectMode default, or dropping the n byte requirement from ._read.

chrisdickinson commented 9 years ago

Oh yeah – almost forgot! /cc @iojs/streams

sonewman commented 9 years ago

@chrisdickinson this is was what I was hoping to allude too :smile:

This would give us a way to introduce these things into streams, without breaking backwards compatibility.

I'm not sure exactly on semantics of start and pull but could they be triggered inside of _read method semtantics for instance?

For node / io.js internals it is going to be more advantageous to inherit from prototypes since the memory reuse. Does WHATWG streams utilise the prototype in the same way?

domenic commented 9 years ago

For node / io.js internals it is going to be more advantageous to inherit from prototypes since the memory reuse. Does WHATWG streams utilise the prototype in the same way?

I don't really understand this question, but here is a how things typically work:

Given the code at https://streams.spec.whatwg.org/#example-both, plus:

const source = new WebSocketSource(ws);
const readable = new ReadableStream(source);

you will have:

readable.read === ReadableStream.prototype.read, etc. for all public API methods
readable._underlyingSource === source
readable._underlyingSource.start === WebSocketSource.prototype.start, etc.

so in no cases will there be methods that are not on some prototype.

chrisdickinson commented 9 years ago

I'm not sure exactly on semantics of start and pull but could they be triggered inside of _read method semtantics for instance?

WRT to start, that's mostly for implementers wrapping a flowing underlying resource, so that they can set up listeners on the resource and enqueue any incoming data (as I understand it.) pull is analogous to _read, in that it gets called when a stream needs to fetch data from an underlying resource in order to fulfill a .read() request.

For node / io.js internals it is going to be more advantageous to inherit from prototypes since the memory reuse. Does WHATWG streams utilise the prototype in the same way?

They do not use the prototype, though I think the overhead from adding a new function per-stream is relatively small, and for folks implementing subclasses I'm fairly certain there are ways to reuse the push and start function definitions separate from attaching them to the .prototype.

rvagg commented 9 years ago

+1 from me, I'd like to see a PR for this!

While we're at it, how about we add new-less constructors to get even closer to idiomatic Node code (if (!(this instanceof ReadableStream)) return new ReadableStream(whatever)). Basically making through2 redundant.

mafintosh commented 9 years ago

@rvagg we already to that

https://github.com/iojs/readable-stream/blob/master/lib/_stream_transform.js#L95-L96 https://github.com/iojs/readable-stream/blob/master/lib/_stream_readable.js#L116-L117 https://github.com/iojs/readable-stream/blob/master/lib/_stream_writable.js#L154-L155 https://github.com/iojs/readable-stream/blob/master/lib/_stream_duplex.js#L37-L38

rvagg commented 9 years ago

doh! I did a quick sanity check before posting that but obviously was too quick about it

mafintosh commented 9 years ago

They do not use the prototype, though I think the overhead from adding a new function per-stream is relatively small, and for folks implementing subclasses I'm fairly certain there are ways to reuse the push and start function definitions separate from attaching them to the .prototype.

Do we have a stream benchmark? We probably wanna make sure that (whatever) approach is taken is easy optimizable by v8

sonewman commented 9 years ago

@domenic thanks, I think you answered my question, it could have definitely been worded better. I think I really need to start using WHATWG streams from the reference implementation to understand properly how they work.

@chrisdickinson ahh so pull almost the equivalent to _read.

@mafintosh I think allowing this functionality would be OK for high-level users (i.e. application developers) without too much of a need for speed. But in terms of core internals that's obviously a different matter.

This is an extremely noddy test http://jsperf.com/proto-vs-constructor-set-methods.

chrisdickinson commented 9 years ago

Re: prototype methods for performance: methods like .push() will not be optimized by V8: since all streams share them every time it's invoked against a different stream subclass (Duplex vs Socket vs userland streams), or invoked with a different signature (push(null) / push() / push(new Buffer()) / push("string")) we reduce the chance of it being optimized by increasing the degree of polymorphism it encounters over the course of the program.

As I understand it, the performance bottleneck of streams is around the state transitions – from reading to waiting, or from writing to paused – hence the introduction of highWaterMark, which has sort of a debouncing effect on both sides of the state machine.

sonewman commented 9 years ago

...this was a clever way to trick me into implementing this :wink:

@chrisdickinson you are correct push would struggle since it takes different values. Like you said, in actuality constructing streams by the revealing-constructor pattern is unlikely to be the bottleneck.

domenic commented 9 years ago

@sonewman I realized that one thing I didn't really answer was that given an underlying source with start(enqueue, close, error), where do the enqueue, close, and error functions live? In the case of WHATWG streams, they are indeed created once per stream.

Stated another way: each WHATWG stream has stream@[[enqueue]], stream@[[close]], and stream@[[error]] private state (using browser-supplied "actually private" state, not underscores). These contain functions (not methods!) that are specific to each stream instance.

In io.js, the equivalents are stream.push, stream.push(null), and stream.emit("error", ...). Those do live on the prototype.

This difference is necessary because WHATWG streams have a security boundary so that outside code cannot muck with the state machine. (This in turn is important because then implementers can optimize based on this assumption.) If the state-machine-manipulating methods are exposed on a common prototype, then the security can't exist. Whereas if the state-machine-manipulating functions are passed to the underlying source directly, then only the underlying source gets access.

sonewman commented 9 years ago

@domenic This is very interesting. I always found the pattern of _*state to be interesting in the same way, because it keeps the stream state in another object to keep it self contained (this is not obviously not private in the same way).

I like how it is encapsulates the complexity from the user, so they can construct the object how they like and there is no fear they will tamper with the internals.

From my benchmarks, which are limited only to readable-streams with data events atm.

The use of prototypal inheritance is about 40% faster than that of revealing-constructor pattern.

But we are talking 0.03ms vs 0.08ms on average creating a new stream and parsing a 8KB JSON string, split into 100 byte chunks. Edit - I am not sure really how life like to a use-case this would be.

In the most average cases is really not going to make all that much difference to a users application, I don't think...

Obviously core would not need to change any code and continue to have the extra performance.

sonewman commented 9 years ago

@chrisdickinson I guess i was just wondering if something like this was possible:

disclaimer - this is not a proposed implementation

inherits(Readable3_1, stream.Readable)

function Readable3_1(options) {
  stream.Readable.call(this, options)

  this._readableState.started = false

  this._start = options.start
  this._pull = options.pull

  this.push = this.push.bind(this)
}

function start_(stream, state) {
  stream.pause()

  stream._start(stream.push, function ready() {
    state.started = true
    stream.resume()
  })
}

function pull_(stream, state, n) {
  stream._pull(stream.push, n, function end() {
    stream.push(null)
  })
}

Readable3_1.prototype._read = function (n) {
  var state = this._readableState
  if (!state.started) {
    return start_(this, state)
  } else {
    return pull_(this, state, n)
  }
}

chrisdickinson commented 9 years ago

@sonewman I think that's headed down the right path. I'd like @domenic to weigh in on the start portion of the state machine a bit to make sure we're conceptualizing it correctly here.

Specifics:

I'd like to omit n from _pull, if possible.
Where symbols are available, we might use those to hide _pull.
_pull should accept an optional error(Error) callback that puts the stream into an errored state (in our case, "emits error.")
Ideally end would call onEofChunk directly instead of push(null) so we can begin separating that logic.
Hare-brained idea: if a readable stream is created with these pull and start parameters, poison _read and make the stream objectMode by default.

The theme here is to put streams on a good footing to make the rest of the ergonomic changes necessary to put them in line with WHATWG streams as well as solve some of the common streams pain points. That is to say: ideally end users will only have to change their code to this style once, and further improvements will come transparently afterwards.

domenic commented 9 years ago

start in WHATWG streams runs immediately upon construction. It's async though and until it finishes pull() will not be called.

_pull should accept an optional error(Error) callback that puts the stream into an errored state (in our case, "emits error.")

This is being changed to being promise-returning, see https://github.com/whatwg/streams/pull/272. start takes an error parameter since it's used for setting up event listeners and those might run after the asynchronous start has completed.

sonewman commented 9 years ago

@iojs/streams I was thinking about this, later on in the meeting.

If everything is compatible with WHATWG and this change were to be made.

When we introduce the new (WHATWG) methods later how would the interact? I mean could read be synonymous pull? or would pull be something that was called by read?

I guess I am wondering what would happen if someone were to do this:

var readable = new Readable({
  read(n) {
    // get some data
    this.push(/* push some data */)
  },
  pull(enqueue, close, error) {
    enqueue(/* some data */)
  }
})

What would happen first? Should this be able to happen? Could the pull model be the default _read?

chrisdickinson commented 9 years ago

IMO, providing both read and pull should throw an exception.

sonewman commented 9 years ago

someone could also potentially set the methods on a constructed stream too:

var readable = new Readable()
readable._read = n => readable.push(/* data */)
readable._pull = enqueue => enqueue(/* data */)

(sorry i'm just over analysing this from all angles)

I guess what i'm thinking is that if pull was in someway an implementation on top of _read that would not be an issue. But if it wasn't there might be a conflict of behaviour.

mafintosh commented 9 years ago

@sonewman that problem isn't really related to simple stream creation though.

sonewman commented 9 years ago

@mafintosh you are absolutely right, it is not directly.

I guess I am just playing devils advocate and thought I'd share my thoughts since these questions hadn't really occurred to me before.

chrisdickinson commented 9 years ago

@sonewman So, I suppose what I'd want to do – assuming we can do so without unduly breaking browserify – is: when we receive a pull function, we poison the stream's ._read property such that setting it throws an exception.

sonewman commented 9 years ago

@chrisdickinson that's a great answer! :smiling_imp:

No further questions.

chrisdickinson commented 9 years ago

This passed, and was merged into io.js core.

nodejs / readable-stream

Simpler stream creation #102