Can we not have "four interfaces"

domenic commented 10 years ago

The current thing has Readable and Writable streams, but their constructors take objects which have two more interfaces--- respectively for putting data into the readable, and using the data in the writable. Is it possible this could be reduced to only two?

I am told CSP channels (see #88) are basically this idea.

/cc @gozala @sicking

justmoon commented 10 years ago

Go's channels were mentioned as a modern incarnation of CSP: http://golang.org/doc/effective_go.html#channels

Gozala commented 10 years ago

This is example I have implemented in JS: https://gist.github.com/Gozala/7242467

tyoshino commented 10 years ago

Have you tried to use the same interface for writing and reading? I.e. the constructors take only buffering parameters or strategy objects and pushing/getting-pulled/erroring by source/sink are done through the same interface as what BWS and BRS interface have now.

sicking commented 10 years ago

Yes. That's what the vision in my head is. And after talking to @Gozala I think that matches his thinking too.

Basically when you create a "Stream" you get a readable side and a writable side. Writing on the writable side would be equivalent to calling the [[push]] method.

We'd still need to have a way to implement different buffering strategies. But that should be doable by creating an interface specifically for buffering handling similar to what WritableStream currently has. @Gozala has some good ideas here.

domenic commented 10 years ago

I did a brief survey of this. It seems to not be a great line of inquiry, although the impetus is in the right place. Let me explain.

Matches

For ReadableStream's start and pull constructor parameters, the push, close, error parameters match fairly closely with WritableStream's write, close, abort respectively.

Mismatches

Both streams use their start constructor parameter to ensure that they stay in a waiting state until any promise returned from that is complete. In contrast, this asynchronous setup phase isn't something you could extract by passing a writable stream to the readable stream constructor or vice-versa.
ReadableStream's pull constructor parameter is called in reaction to specific events regarding the state of the internal stream, namely when the buffer is drained or the consumer calls wait(). A writable stream that would be passed in to the readable stream's constructor has no way of receiving these notifications.
In general, the semantics of hooking up the constructor arguments to the stream innards are fairly simple, as-is. Indirecting through another stream would cause lots of pain and, I think, impedance mismatch as we worked around the complexity of a full stream when really we need something simpler.
WritableStream's write constructor parameter gets data "pushed" to it, along with the capabilities to indicate what happened with that data, via (data, done, error). I don't see a way to model that by passing in a readable stream, without much more awkardness, essentially forcing every writable stream creator to make a whole drain-then-wait loop inside a function that should (in my mind) just be concentrating on writing data to the underlying sink.
ReadableStream's cancel constructor parameter, plus WritableStream's close and abort parameters, are defining reactions, and need to be implemented in a source- or sink-specific way; their semantics cannot be subsumed by passing streams to each other.

Resulting Thoughts

The compelling symmetry might be to pass a WritableStream-like thing to ReadableStream's start and pull parameters. Let's see where that leads
The signature now becomes pull({ write, close, abort }) instead of pull(push, close, error), or given that parameters are freely renamable, you could always call that pull(write, close, abort).
But is there really a symmetry there? { write, close, abort } would make me think it operates on some kind of writable stream. But there is no writable stream to be found---we're dealing with the readable stream itself.
In general this is an instance of a "circular dependency" issue: if we define writable streams in terms of readable streams and readable streams in terms of writable streams, we're gonna have a bad time.
The alternate is, I think, what is being explored by @Gozala in his channel work: have a stream that is both readable and writable, and doesn't have any connection to an underlying source or sink, but instead just acts as a queue which people can put into and get out of.
This immediately raises a number of issues, e.g. how to you represent a read-only file? The usual way is to vend only the read capabilities from the object. But the best pattern we have for doing that in JS is the revealing constructor pattern. This seems likely to lead us right back to where we are now, in circles.
For example, revealing the read side while keeping the write side seems fairly straightforward; as noted the difference between ({ write, close, abort }) and (push, close, error) is mostly superficial. But what about revealing the write side while keeping the read side? That means that in order to transfer data to the underlying sink, you need to do a read-drain-wait loop. Since everyone now needs to do this, we might as well take care of it for them, and abstract it into an easy utility function built into the constructor. Oh, but now we have the equivalent of WritableStream's write(data, done, error) constructor parameter. Hmm.

josh commented 10 years ago

This immediately raises a number of issues, e.g. how to you represent a read-only file? The usual way is to vend only the read capabilities from the object. But the best pattern we have for doing that in JS is the revealing constructor pattern. This seems likely to lead us right back to where we are now, in circles.

I agree that it seems essential to have read only and write only interfaces for the system edges use case.

As for the channel approach, you could have the reader and writer channels be separate.

function makeSocketStream(host, port) {
  rawSocket = createRawSocketObject(host, port);

  [readable, writable] = Channel();

  rawSocket.ondata = chunk => {
    writeable.push(chunk);
  };
  rawSocket.onend = writable.close;
  rawSocket.onerror = writable.error;

  return readable;
}

Heres an example from Ruby's IO pipe.

>> rd, wr = IO.pipe
=> [#<IO:fd 10>, #<IO:fd 11>]
>> wr.write "foo"
=> 3
>> rd.read_nonblock(10)
=> "foo"

domenic commented 10 years ago

Right, which gets us right back to the equivalent of the old promise "deferred" pattern, with no constructors in sight. Not so great, especially combined with the other drawbacks (e.g. the awkwardness of how you have to read from the read-side and then manually buffer until your underlying sink is able to accept data.)

Raynos commented 10 years ago

whats wrong with the deferred pattern ?

var { input, output } = Channel()

seems reasonable.

Gozala commented 10 years ago

I think there are tons of options. My favorite one is suggested by @Raynos above. I don't think analogy with deferred pattern is quite relevant. This API is significantly different in both what it does and what it represents.

Alternatively channel could play role of pipe and also expose read / write ports as separate objects if desired:

var channels = new WeakMap()

function Port(channel) {
  channels.set(this, channel)
}
Port.protototype.close = function() {
  return channels.get(this).close()
}

function InputPort(channel) {
  Port.call(this, channel)
}
InputPort.protototype = Object.create(Port.protototype)
InputPort.protototype.constructor = InputPort
InputPort.protototype.take = function() {
  return channels.get(this).take()
}

function OutputPort(channel) {
  Port.call(this, channel)
}
OutputPort.protototype = Object.create(Port.protototype)
OutputPort.protototype.constructor = InputPort
OutputPort.protototype.put = function(value) {
  return channels.get(this).put(this, )
}

var inputs = new WeakMap()
var outputs = new WeakMap()
function Channel() {
  // ....
}
Channel.protototype = {
  constructor: Channel,
  put: put,
  take: take,
  get input() {
    if (!inputs.has(this))
      inputs.set(this, new InputPort(this))

    return inputs.get(this)
  }
  get output() {
    if (!outputs.has(this))
      outputs.set(this, new OutputPort(this))

    return outputs.get(this)
  }
}

sicking commented 10 years ago

I think @josh shows a good pattern. I don't care much if we use [readable, writable] = Channel() or { readable, writable } = Channel().

I definitely think that we need a one-way Channel primitive. We might also want something which allows two-way communication, but let's do that on top of the one-way Channel.

Essentially we want the Channel to work as a queue. By default it's likely a queue that can only hold 1 value before it signals back pressure. I.e. as soon as it gets its first value it'll ask the writer to hold off on providing more data (though it'll still accept the data if written to of course).

But then we should allow passing in other buffering strategies to the Channel constructor. These strategies should have the ability to simply count the number of values held by the buffer, or count total number of bytes, or total .length or some such.

domenic commented 10 years ago

But then we should allow passing in other buffering strategies to the Channel constructor.

To be clear, approaches such as these would not use constructors.

I still haven't seen anyone address how awkard it would be to write code that puts data in the underlying sink. It largely defeats the purpose of a streaming abstraction if you have to do that yourself. It would be helpful for someone to illustrate how they imagine this example working?

domenic commented 10 years ago

To be clear, approaches such as these would not use constructors.

Let me expand on this. It reveals the fundamental problem with the deferred-esque pattern.

In the code

var { input, output } = Channel(); // probably more properly `channel()`

Channel is not a constructor, but instead a factory function.

What are input and output? Well, they have methods, and we probably don't want copies of those methods on every instance of them, so they must be instances of some prototype, e.g. WritableStream.prototype and ReadableStream.prototype.

But how did they get created in the first place? The natural answer, given the prototypes in play, is via the constructors, var input = new WritableStream() and var output = new ReadableStream(). Furthermore, whoever constructed them must have access to their internals, since the person constructing them hooks up their relationship together. How did they get access to those internals? The two possible answers are: (a) "C++ browser magic," which is an answer we try to avoid these days (e.g. it makes our JS-hosted reference implementation impossible); and (b) via the revealing constructor pattern, or some variant of it.

So again, we come right back to our current design. After this circumlocation, we see that Channel is actually a higher-level object than the ReadableStream + WritableStream combination: it abstracts away the manner in which you connect those two constructors to each other in a particular case. In fact, the particular case Channel embodies is a no-op transform stream---making Channel just a subset of #20, which we've had planned for a while!

Gozala commented 10 years ago

I still haven't seen anyone address how awkard it would be to write code that puts data in the underlying sink. It largely defeats the purpose of a streaming abstraction if you have to do that yourself. It would be helpful for someone to illustrate how they imagine this example working?

Have you looked at my fork of example.md ? I believe it illustrates same example. I do plan on changing few things though to better support sync read use case.

Gozala commented 10 years ago

To be clear, approaches such as these would not use constructors.
Let me expand on this. It reveals the fundamental problem with the deferred-esque pattern.

In the code

var { input, output } = Channel(); // probably more properly channel()

Channel is not a constructor, but instead a factory function.

I don't agree with this statement, if you take a look either at my reference implementation or my example in previous comment it's clearly not a factory.

What are input and output? Well, they have methods, and we probably don't want copies of those methods on every instance of them, so they must be instances of some prototype, e.g. WritableStream.prototype and ReadableStream.prototype.

My comment above used InputPort and OutputPort as prototypes for relavant ports, same is true for the reference implementation.

But how did they get created in the first place? The natural answer, given the prototypes in play, is via the constructors, var input = new WritableStream() and var output = new ReadableStream(). Furthermore, whoever constructed them must have access to their internals, since the person constructing them hooks up their relationship together. How did they get access to those internals? The two possible answers are: (a) "C++ browser magic," which is an answer we try to avoid these days (e.g. it makes our JS-hosted reference implementation impossible); and (b) via the revealing constructor pattern, or some variant of it.

I think you make it sound very complicated while it's not, all the input / output port needs is access to take / put queues and buffer. So anyone could create Input / Output ports.

Channel constructor just creates Input / Output ports that share same read / write to same buffer and queue dequeue operations into same queue.

There are many ways this can be expressed in JS and you can take a look at reference implementation for one example of this.

So again, we come right back to our current design. After this circumlocation, we see that Channel is actually a higher-level object than the ReadableStream + WritableStream combination: it abstracts away the manner in which you connect those two constructors to each other in a particular case. In fact, the particular case Channel embodies is a no-op transform stream---making Channel just a subset of #20, which we've had planned for a while!

The difference is that Channel takes care of state machine that is Readable / Writable streams currently force users to deal with. I do believe that put / take on the pipe is a lot simpler and easier to understand than multitude of private public APIs that streams currently impose.

I would also argue that research papers written back in 70s that is being adobted by new languages like go, rust, clojure is a good prove that this idea has something to it.

domenic commented 10 years ago

I do believe that put / take on the pipe is a lot simpler and easier to understand than multitude of private public APIs that streams currently impose.

This seems to me to indicate that it would be a useful API to wrap true ReadableStream and WritableStream instances, to provide something simpler for those that don't need the fine-grained control we have shown to be necessary for I/O in Node, and would prefer a strategy based on research papers.

Gozala commented 10 years ago

and would prefer a strategy based on research papers.

This is based of CSP research paper that actually that has proves that this minimal API is enough to express all of that. Also as it's not based only of a paper, many modern languages adopted this channel interface.

Again me and @Raynos are working on providing examples of every single concern that may arise with such API, but in order to keep this constructive it would be useful to illustrate actual issues, saying that this is a factory pattern and is bad does not really helps.

Raynos commented 10 years ago

As far as I understand Channel replaces ReadableStream and WritableStream completely.

I will work in porting Stream examples to channels, especially the Writable ones.

whatwg / streams

Can we not have "four interfaces" #102