Closed sonewman closed 9 years ago
I like it. Gets us closer to through2 streams as well.
We'd discussed this back in the 0.10 days, but never implemented it.
It's also similar to how WHATWG streams work, from what I can see from the examples.
Obviously adding to the prototype would be desirable for optimisation.
Anyway thanks for the feedback, I wanted to get some opinions before implementing anything :smiley:
This is quite elegant.
On Sun Jan 25 2015 at 12:32:08 PM Sam Newman notifications@github.com wrote:
It's also similar to how WHATWG streams work, from what I can see from the examples.
Obviously adding to the prototype would be desirable for optimisation.
Anyway thanks for the feedback, I wanted to get some opinions before implementing anything [image: :smiley:]
— Reply to this email directly or view it on GitHub https://github.com/iojs/readable-stream/issues/102#issuecomment-71391661 .
transform likely needs an optional flush
var transform = new stream.Transform({
transform: function (data, enc, next) { },
flush: function (done) { }
})
duplex and writeable need an optional writev
var writable = new stream.Writable({
write: function (data, enc, next) { },
writev: function (chunks, next) { }
})
var duplex = new stream.Duplex({
read: function (n) { },
write: function (data, enc, next) { },
writev: function (chunks, next) { }
})
but yes I like this idea
@calvinmetcalf yes this is very true, thanks for the reminder :smile:
If we go the route of simpler stream instantiation, I'd like to support the revealing-constructor approach from WHATWG streams:
var readable = new Readable({
start: (enqueue, ready) => {},
pull: (enqueue, n, end) => {}
// ...
});
Adding enqueue
(separate from end
) would also give us a lever with which to coax new features into streams without affecting .push()
– this could be a good mechanism to introduce things like in-alphabet null
/undefined
, objectMode
default, or dropping the n
byte requirement from ._read
.
Oh yeah – almost forgot! /cc @iojs/streams
@chrisdickinson this is was what I was hoping to allude too :smile:
This would give us a way to introduce these things into streams, without breaking backwards compatibility.
I'm not sure exactly on semantics of start
and pull
but could they be triggered inside of _read
method semtantics for instance?
For node / io.js internals it is going to be more advantageous to inherit from prototypes since the memory reuse. Does WHATWG streams utilise the prototype
in the same way?
For node / io.js internals it is going to be more advantageous to inherit from prototypes since the memory reuse. Does WHATWG streams utilise the prototype in the same way?
I don't really understand this question, but here is a how things typically work:
Given the code at https://streams.spec.whatwg.org/#example-both, plus:
const source = new WebSocketSource(ws);
const readable = new ReadableStream(source);
you will have:
readable.read === ReadableStream.prototype.read
, etc. for all public API methodsreadable._underlyingSource === source
readable._underlyingSource.start === WebSocketSource.prototype.start
, etc.so in no cases will there be methods that are not on some prototype.
I'm not sure exactly on semantics of start and pull but could they be triggered inside of _read method semtantics for instance?
WRT to start
, that's mostly for implementers wrapping a flowing underlying resource, so that they can set up listeners on the resource and enqueue
any incoming data (as I understand it.) pull
is analogous to _read
, in that it gets called when a stream needs to fetch data from an underlying resource in order to fulfill a .read()
request.
For node / io.js internals it is going to be more advantageous to inherit from prototypes since the memory reuse. Does WHATWG streams utilise the prototype in the same way?
They do not use the prototype, though I think the overhead from adding a new function per-stream is relatively small, and for folks implementing subclasses I'm fairly certain there are ways to reuse the push
and start
function definitions separate from attaching them to the .prototype
.
+1 from me, I'd like to see a PR for this!
While we're at it, how about we add new
-less constructors to get even closer to idiomatic Node code (if (!(this instanceof ReadableStream)) return new ReadableStream(whatever)
). Basically making through2 redundant.
@rvagg we already to that
https://github.com/iojs/readable-stream/blob/master/lib/_stream_transform.js#L95-L96 https://github.com/iojs/readable-stream/blob/master/lib/_stream_readable.js#L116-L117 https://github.com/iojs/readable-stream/blob/master/lib/_stream_writable.js#L154-L155 https://github.com/iojs/readable-stream/blob/master/lib/_stream_duplex.js#L37-L38
doh! I did a quick sanity check before posting that but obviously was too quick about it
They do not use the prototype, though I think the overhead from adding a new function per-stream is relatively small, and for folks implementing subclasses I'm fairly certain there are ways to reuse the push and start function definitions separate from attaching them to the .prototype.
Do we have a stream benchmark? We probably wanna make sure that (whatever) approach is taken is easy optimizable by v8
@domenic thanks, I think you answered my question, it could have definitely been worded better. I think I really need to start using WHATWG streams from the reference implementation to understand properly how they work.
@chrisdickinson ahh so pull
almost the equivalent to _read
.
@mafintosh I think allowing this functionality would be OK for high-level users (i.e. application developers) without too much of a need for speed. But in terms of core internals that's obviously a different matter.
This is an extremely noddy test http://jsperf.com/proto-vs-constructor-set-methods.
Re: prototype methods for performance: methods like .push()
will not be optimized by V8: since all streams share them every time it's invoked against a different stream subclass (Duplex
vs Socket
vs userland streams), or invoked with a different signature (push(null)
/ push()
/ push(new Buffer())
/ push("string")
) we reduce the chance of it being optimized by increasing the degree of polymorphism it encounters over the course of the program.
As I understand it, the performance bottleneck of streams is around the state transitions – from reading to waiting, or from writing to paused – hence the introduction of highWaterMark
, which has sort of a debouncing effect on both sides of the state machine.
...this was a clever way to trick me into implementing this :wink:
@chrisdickinson you are correct push
would struggle since it takes different values.
Like you said, in actuality constructing streams by the revealing-constructor pattern is unlikely to be the bottleneck.
@sonewman I realized that one thing I didn't really answer was that given an underlying source with start(enqueue, close, error)
, where do the enqueue
, close
, and error
functions live? In the case of WHATWG streams, they are indeed created once per stream.
Stated another way: each WHATWG stream has stream@[[enqueue]]
, stream@[[close]]
, and stream@[[error]]
private state (using browser-supplied "actually private" state, not underscores). These contain functions (not methods!) that are specific to each stream instance.
In io.js, the equivalents are stream.push
, stream.push(null)
, and stream.emit("error", ...)
. Those do live on the prototype.
This difference is necessary because WHATWG streams have a security boundary so that outside code cannot muck with the state machine. (This in turn is important because then implementers can optimize based on this assumption.) If the state-machine-manipulating methods are exposed on a common prototype, then the security can't exist. Whereas if the state-machine-manipulating functions are passed to the underlying source directly, then only the underlying source gets access.
@domenic This is very interesting. I always found the pattern of _*state
to be interesting in the same way, because it keeps the stream state in another object to keep it self contained (this is not obviously not private in the same way).
I like how it is encapsulates the complexity from the user, so they can construct the object how they like and there is no fear they will tamper with the internals.
From my benchmarks, which are limited only to readable-streams with data events atm.
The use of prototypal inheritance is about 40% faster than that of revealing-constructor pattern.
But we are talking 0.03ms vs 0.08ms on average creating a new stream and parsing a 8KB JSON string, split into 100 byte chunks. Edit - I am not sure really how life like to a use-case this would be.
In the most average cases is really not going to make all that much difference to a users application, I don't think...
Obviously core would not need to change any code and continue to have the extra performance.
@chrisdickinson I guess i was just wondering if something like this was possible:
disclaimer - this is not a proposed implementation
inherits(Readable3_1, stream.Readable)
function Readable3_1(options) {
stream.Readable.call(this, options)
this._readableState.started = false
this._start = options.start
this._pull = options.pull
this.push = this.push.bind(this)
}
function start_(stream, state) {
stream.pause()
stream._start(stream.push, function ready() {
state.started = true
stream.resume()
})
}
function pull_(stream, state, n) {
stream._pull(stream.push, n, function end() {
stream.push(null)
})
}
Readable3_1.prototype._read = function (n) {
var state = this._readableState
if (!state.started) {
return start_(this, state)
} else {
return pull_(this, state, n)
}
}
@sonewman I think that's headed down the right path. I'd like @domenic to weigh in on the start
portion of the state machine a bit to make sure we're conceptualizing it correctly here.
Specifics:
n
from _pull
, if possible._pull
._pull
should accept an optional error(Error)
callback that puts the stream into an errored state (in our case, "emits error.")end
would call onEofChunk
directly instead of push(null)
so we can begin separating that logic.pull
and start
parameters, poison _read
and make the stream objectMode by default.The theme here is to put streams on a good footing to make the rest of the ergonomic changes necessary to put them in line with WHATWG streams as well as solve some of the common streams pain points. That is to say: ideally end users will only have to change their code to this style once, and further improvements will come transparently afterwards.
start in WHATWG streams runs immediately upon construction. It's async though and until it finishes pull() will not be called.
_pull should accept an optional error(Error) callback that puts the stream into an errored state (in our case, "emits error.")
This is being changed to being promise-returning, see https://github.com/whatwg/streams/pull/272. start takes an error parameter since it's used for setting up event listeners and those might run after the asynchronous start has completed.
@iojs/streams I was thinking about this, later on in the meeting.
If everything is compatible with WHATWG and this change were to be made.
When we introduce the new (WHATWG) methods later how would the interact? I mean could read
be synonymous pull
? or would pull be something that was called by read?
I guess I am wondering what would happen if someone were to do this:
var readable = new Readable({
read(n) {
// get some data
this.push(/* push some data */)
},
pull(enqueue, close, error) {
enqueue(/* some data */)
}
})
What would happen first? Should this be able to happen? Could the pull
model be the default _read
?
IMO, providing both read
and pull
should throw an exception.
someone could also potentially set the methods on a constructed stream too:
var readable = new Readable()
readable._read = n => readable.push(/* data */)
readable._pull = enqueue => enqueue(/* data */)
(sorry i'm just over analysing this from all angles)
I guess what i'm thinking is that if pull
was in someway an implementation on top of _read
that would not be an issue. But if it wasn't there might be a conflict of behaviour.
@sonewman that problem isn't really related to simple stream creation though.
@mafintosh you are absolutely right, it is not directly.
I guess I am just playing devils advocate and thought I'd share my thoughts since these questions hadn't really occurred to me before.
@sonewman So, I suppose what I'd want to do – assuming we can do so without unduly breaking browserify – is: when we receive a pull
function, we poison the stream's ._read
property such that setting it throws an exception.
@chrisdickinson that's a great answer! :smiling_imp:
No further questions.
This passed, and was merged into io.js core.
I have mentioned this in a couple of other threads before, I just wanted to get some thoughts and opinions really.
My suggestion is that we allow the
_
methods, which need to be set on a stream, be passed in as part of the options on instantiation. e.g.edit - functions in ES6 for readability
What do you think?