whatwg / streams

Streams Standard
https://streams.spec.whatwg.org/
Other
1.35k stars 161 forks source link

Can we not have "four interfaces" #102

Closed domenic closed 8 years ago

domenic commented 10 years ago

The current thing has Readable and Writable streams, but their constructors take objects which have two more interfaces--- respectively for putting data into the readable, and using the data in the writable. Is it possible this could be reduced to only two?

I am told CSP channels (see #88) are basically this idea.

/cc @gozala @sicking

justmoon commented 10 years ago

Go's channels were mentioned as a modern incarnation of CSP: http://golang.org/doc/effective_go.html#channels

Gozala commented 10 years ago

This is example I have implemented in JS: https://gist.github.com/Gozala/7242467

tyoshino commented 10 years ago

Have you tried to use the same interface for writing and reading? I.e. the constructors take only buffering parameters or strategy objects and pushing/getting-pulled/erroring by source/sink are done through the same interface as what BWS and BRS interface have now.

sicking commented 10 years ago

Yes. That's what the vision in my head is. And after talking to @Gozala I think that matches his thinking too.

Basically when you create a "Stream" you get a readable side and a writable side. Writing on the writable side would be equivalent to calling the [[push]] method.

We'd still need to have a way to implement different buffering strategies. But that should be doable by creating an interface specifically for buffering handling similar to what WritableStream currently has. @Gozala has some good ideas here.

domenic commented 10 years ago

I did a brief survey of this. It seems to not be a great line of inquiry, although the impetus is in the right place. Let me explain.

Matches

Mismatches

Resulting Thoughts

josh commented 10 years ago

This immediately raises a number of issues, e.g. how to you represent a read-only file? The usual way is to vend only the read capabilities from the object. But the best pattern we have for doing that in JS is the revealing constructor pattern. This seems likely to lead us right back to where we are now, in circles.

I agree that it seems essential to have read only and write only interfaces for the system edges use case.

As for the channel approach, you could have the reader and writer channels be separate.

function makeSocketStream(host, port) {
  rawSocket = createRawSocketObject(host, port);

  [readable, writable] = Channel();

  rawSocket.ondata = chunk => {
    writeable.push(chunk);
  };
  rawSocket.onend = writable.close;
  rawSocket.onerror = writable.error;

  return readable;
}

Heres an example from Ruby's IO pipe.

>> rd, wr = IO.pipe
=> [#<IO:fd 10>, #<IO:fd 11>]
>> wr.write "foo"
=> 3
>> rd.read_nonblock(10)
=> "foo"
domenic commented 10 years ago

Right, which gets us right back to the equivalent of the old promise "deferred" pattern, with no constructors in sight. Not so great, especially combined with the other drawbacks (e.g. the awkwardness of how you have to read from the read-side and then manually buffer until your underlying sink is able to accept data.)

Raynos commented 10 years ago

whats wrong with the deferred pattern ?

var { input, output } = Channel()

seems reasonable.

Gozala commented 10 years ago

I think there are tons of options. My favorite one is suggested by @Raynos above. I don't think analogy with deferred pattern is quite relevant. This API is significantly different in both what it does and what it represents.

Alternatively channel could play role of pipe and also expose read / write ports as separate objects if desired:

var channels = new WeakMap()

function Port(channel) {
  channels.set(this, channel)
}
Port.protototype.close = function() {
  return channels.get(this).close()
}

function InputPort(channel) {
  Port.call(this, channel)
}
InputPort.protototype = Object.create(Port.protototype)
InputPort.protototype.constructor = InputPort
InputPort.protototype.take = function() {
  return channels.get(this).take()
}

function OutputPort(channel) {
  Port.call(this, channel)
}
OutputPort.protototype = Object.create(Port.protototype)
OutputPort.protototype.constructor = InputPort
OutputPort.protototype.put = function(value) {
  return channels.get(this).put(this, )
}

var inputs = new WeakMap()
var outputs = new WeakMap()
function Channel() {
  // ....
}
Channel.protototype = {
  constructor: Channel,
  put: put,
  take: take,
  get input() {
    if (!inputs.has(this))
      inputs.set(this, new InputPort(this))

    return inputs.get(this)
  }
  get output() {
    if (!outputs.has(this))
      outputs.set(this, new OutputPort(this))

    return outputs.get(this)
  }
}
sicking commented 10 years ago

I think @josh shows a good pattern. I don't care much if we use [readable, writable] = Channel() or { readable, writable } = Channel().

I definitely think that we need a one-way Channel primitive. We might also want something which allows two-way communication, but let's do that on top of the one-way Channel.

Essentially we want the Channel to work as a queue. By default it's likely a queue that can only hold 1 value before it signals back pressure. I.e. as soon as it gets its first value it'll ask the writer to hold off on providing more data (though it'll still accept the data if written to of course).

But then we should allow passing in other buffering strategies to the Channel constructor. These strategies should have the ability to simply count the number of values held by the buffer, or count total number of bytes, or total .length or some such.

domenic commented 10 years ago

But then we should allow passing in other buffering strategies to the Channel constructor.

To be clear, approaches such as these would not use constructors.

I still haven't seen anyone address how awkard it would be to write code that puts data in the underlying sink. It largely defeats the purpose of a streaming abstraction if you have to do that yourself. It would be helpful for someone to illustrate how they imagine this example working?

domenic commented 10 years ago

To be clear, approaches such as these would not use constructors.

Let me expand on this. It reveals the fundamental problem with the deferred-esque pattern.

In the code

var { input, output } = Channel(); // probably more properly `channel()`

Channel is not a constructor, but instead a factory function.

What are input and output? Well, they have methods, and we probably don't want copies of those methods on every instance of them, so they must be instances of some prototype, e.g. WritableStream.prototype and ReadableStream.prototype.

But how did they get created in the first place? The natural answer, given the prototypes in play, is via the constructors, var input = new WritableStream() and var output = new ReadableStream(). Furthermore, whoever constructed them must have access to their internals, since the person constructing them hooks up their relationship together. How did they get access to those internals? The two possible answers are: (a) "C++ browser magic," which is an answer we try to avoid these days (e.g. it makes our JS-hosted reference implementation impossible); and (b) via the revealing constructor pattern, or some variant of it.

So again, we come right back to our current design. After this circumlocation, we see that Channel is actually a higher-level object than the ReadableStream + WritableStream combination: it abstracts away the manner in which you connect those two constructors to each other in a particular case. In fact, the particular case Channel embodies is a no-op transform stream---making Channel just a subset of #20, which we've had planned for a while!

Gozala commented 10 years ago

I still haven't seen anyone address how awkard it would be to write code that puts data in the underlying sink. It largely defeats the purpose of a streaming abstraction if you have to do that yourself. It would be helpful for someone to illustrate how they imagine this example working?

Have you looked at my fork of example.md ? I believe it illustrates same example. I do plan on changing few things though to better support sync read use case.

Gozala commented 10 years ago
To be clear, approaches such as these would not use constructors.

Let me expand on this. It reveals the fundamental problem with the deferred-esque pattern.

In the code

var { input, output } = Channel(); // probably more properly channel()

Channel is not a constructor, but instead a factory function.

I don't agree with this statement, if you take a look either at my reference implementation or my example in previous comment it's clearly not a factory.

What are input and output? Well, they have methods, and we probably don't want copies of those methods on every instance of them, so they must be instances of some prototype, e.g. WritableStream.prototype and ReadableStream.prototype.

My comment above used InputPort and OutputPort as prototypes for relavant ports, same is true for the reference implementation.

But how did they get created in the first place? The natural answer, given the prototypes in play, is via the constructors, var input = new WritableStream() and var output = new ReadableStream(). Furthermore, whoever constructed them must have access to their internals, since the person constructing them hooks up their relationship together. How did they get access to those internals? The two possible answers are: (a) "C++ browser magic," which is an answer we try to avoid these days (e.g. it makes our JS-hosted reference implementation impossible); and (b) via the revealing constructor pattern, or some variant of it.

I think you make it sound very complicated while it's not, all the input / output port needs is access to take / put queues and buffer. So anyone could create Input / Output ports.

Channel constructor just creates Input / Output ports that share same read / write to same buffer and queue dequeue operations into same queue.

There are many ways this can be expressed in JS and you can take a look at reference implementation for one example of this.

So again, we come right back to our current design. After this circumlocation, we see that Channel is actually a higher-level object than the ReadableStream + WritableStream combination: it abstracts away the manner in which you connect those two constructors to each other in a particular case. In fact, the particular case Channel embodies is a no-op transform stream---making Channel just a subset of #20, which we've had planned for a while!

The difference is that Channel takes care of state machine that is Readable / Writable streams currently force users to deal with. I do believe that put / take on the pipe is a lot simpler and easier to understand than multitude of private public APIs that streams currently impose.

I would also argue that research papers written back in 70s that is being adobted by new languages like go, rust, clojure is a good prove that this idea has something to it.

domenic commented 10 years ago

I do believe that put / take on the pipe is a lot simpler and easier to understand than multitude of private public APIs that streams currently impose.

This seems to me to indicate that it would be a useful API to wrap true ReadableStream and WritableStream instances, to provide something simpler for those that don't need the fine-grained control we have shown to be necessary for I/O in Node, and would prefer a strategy based on research papers.

Gozala commented 10 years ago

and would prefer a strategy based on research papers.

This is based of CSP research paper that actually that has proves that this minimal API is enough to express all of that. Also as it's not based only of a paper, many modern languages adopted this channel interface.

Again me and @Raynos are working on providing examples of every single concern that may arise with such API, but in order to keep this constructive it would be useful to illustrate actual issues, saying that this is a factory pattern and is bad does not really helps.

Raynos commented 10 years ago

As far as I understand Channel replaces ReadableStream and WritableStream completely.

I will work in porting Stream examples to channels, especially the Writable ones.