whatwg / streams

Streams Standard
https://streams.spec.whatwg.org/
Other
1.35k stars 161 forks source link

Sync / Async API #107

Closed Gozala closed 10 years ago

Gozala commented 10 years ago

This is more of a question to @domenic & @othiym23 than an actual issue. I have noticed that description of the write method that is passed to a BaseWritableStream constructor can signal back either synchronously asynchronously. I am wondering how is that any different from x being Promise.resolve(true) or true given var x = output.put(data). Which also raises a question if output.put(data, callback) where callback is invoked either synch or async would be considered ok.

My personal take on this matter always have being that given following lines of code:

doAsyncOrAsync(doThat);
doThis();

It is impossible to reason weather doThis is called first and then doThat or other way round if doAsyncOrAsync isn't committed to calling doThat sync or async. I believe that also has being a primary reason why promises call callbacks passed to then on a next tick.

That being said this is quite different from other code snippet:

var x = ch.put(data)
if (x instanceof Promise) {
  x.then(doThat);
  doThis();
} else {
  doThis();
  doThat(x);
}

In above example it's quite clear that doThis is going to run always before to running doThat.

I'm even more puzzled on why:

function print(input) {
  function next(chunk) {
    if (chunk instaceof Promise)
      chunk.then(next)
      else {
        console.log(chunk)
        next(chunk.take())
      }
    next(chunk.take())
  }
}

is considered more error prone than:

function print(readable) {
  function pump() {
    while (readable.state === "readable") {
      console.log(readable.read());
    }

    if (readable.state === "closed") {
      console.log("--- all done!");
    } else {
      // If we're in an error state, the returned promise will be rejected with that error,
      // so no need to handle "waiting" vs. "errored" separately.
      readable.wait().then(pump, e => console.error(e));
    }
  }
  pump()
}

From what I can see only difference is that in one case next / pump is passed as continuation if based of readable.state value while in the other it's based off instanceof check.

Gozala commented 10 years ago

Assuming that there is some difference that I fail to see, would this kind of API make a difference:

function print(input) {
  function next(chunk) {
    if (!chunk.isReady())
       chunk.promise.then(() => next(chunk))
    else if (chunk.valueOf() === EOF)
      console.log("--- all done!")
    else {
      console.log(chunk)
      next(input.take())
    }
  }
  next(input.take())
}

Presuming that chunk.valueOf() throws exception if chunk.isReady() is false.

Raynos commented 10 years ago

:+1:

I think a .put() that returns promise or boolean is fine.

Returning a async value OR a value does not expose zalgo as far as i can see.

Gozala commented 10 years ago

What I have realized after thinking on this subject for some time is that current streams API are trying to overcome limitations introduced by the Promise API. Let me elaborate on this subject little further.

I believe the reason why promises always invoke handlers on next tick is to avoid race conditions caused by the fact that those handlers may be invoke sometimes synchronously and sometimes asynchronously & that does make sense.

Now if you look at streams API it seems to value performance over possible race conditions, although to be completely fare given a different interface user explicitly has to acknowledge the fact that sometimes it either waits or reads synchronously.

The weird thing about this though is that promises don't expose anything similar to streams to opt-in into a faster path. Which I believe is unfortunate, as that would solve a performance problem at the promise level and streams could just inherit that by default.

What I'm trying to suggest I guess is, what if we have extended Promise API in similar (less racy) way as streams do. What if we could rewrite streamToConsole as follows:

function streamToConsole(readable) {
    pump(readable.read());

    function next(data) {
        var stop = data === EOF
        if (stop) {
          console.log("--- all done")
        } else {
          console.log(data)
        }
        return stop
    }

    function pump(chunk) {
        while (chunk) {
          // If chunk is not available yet we need to wait
          // for it.
          if (Promise.isPending(chunk)) {
            chunk.then(_ => pump(chunk));
          } else if (next(Promise.valueOf(chunk))) {
            break
          }
          chunk = readable.read();
        }
    }
}

This may not be most beautiful API, but would have solved the issues of dealing with sync vs async where async read is performance hit at the very base level.

josh commented 10 years ago

@Gozala I had a similar thought thinking about ES7 await.

https://gist.github.com/josh/10885979

domenic commented 10 years ago

I can see how you might think of the problem this way, if you were already preconceiving that streams must use promises, and then trying to tack on a way to satisfy the desired semantics with promises.

The way I see the problem is very different: a buffer is a synchronously-accessible thing---data is either there, or not there. And if it's empty, you would like to be asynchronously notified when it becomes nonempty.

Trying to make access to the buffer asynchronous seems unnatural to me.

Gozala commented 10 years ago

The way I see the problem is very different: a buffer is a synchronously-accessible thing---data is either there, or not there. And if it's empty, you would like to be asynchronously notified when it becomes nonempty.

Let's forget about streams for the moment. I think you could use mostly the same words to describe promises. As a matter of fact streams have a lot more in common with promises than with buffers. Here is description of a stream: In computer science, a stream is a sequence of data elements made available over time.

Both promises and streams are concurrency primitives and core difference is that one represents single unit over time while other sequence of units over time. It is unclear to me why in case of sequence all of the internal state (of what already has being made available) is expose and can be accessed synchronously and in other case (promise) that is strictly unavailable. I would argue that same performance implications apply to both.

Trying to make access to the buffer asynchronous seems unnatural to me.

I think our biggest disagreement comes from the fact that you mix two different concerns, data transfer (streaming) and data aggregation (buffering) into one. If you take a look at my attempt to bring CSP into a picture, you will notice that it emphasizes separation of this two concerns. Namely channels are primitives for data transfer between different components of the application & they do typically use buffers for data aggregation which allows less coordination between schedules different components operate at. You may also notice that all the buffer APIs are strictly synchronous as buffers lay within one concurrency unit. On the other hand channel typically don not, as they usually transport data between concurrent tasks, which implies synchronicity (like IO thread streaming data to the main thread). Given this insight I don't think your conclusion about async buffer access is accurate.

domenic commented 10 years ago

Ah! It seems we have come to understand the fundamental reason why your channels are so different from these streams.

When the people involved in this repo, including implementers looking to implement it, use the word "stream" it is not a concurrency primitive. It does not mean a sequence of units over time. They are not primitives for data transfer between different components of the application. And they are definitely not about synchronizing between different concurrent tasks (i.e. different threads).

I realize this may be confusing, as a lot of the reactive programming community, as well as the CSP community, uses the word "streams" to describe such things. But that's not what this spec is focused on. We are focused on the meaning of the word "streams" as in Node.js: an I/O primitive, meant to map very closely to underlying parts of the kernel interfaces, and designed simply to smooth over platform-specific differences and native/JavaScript boundary incompatibilities.

I apologize for this confusion having led you down a path of thinking that what we were building had some relationship to CSP channels. But we are really dealing with extremely different things here.

Gozala commented 10 years ago

I don't think your conclusions are right, IO has everything to do with concurrency, especially when talking about environments like js where all the IO happens conncurrently with a main thread.

Also please note that all of the modern reancarnetions of csp like go & rust use channels for io & networking.

I get an impression that you're just trying to shut down conversation with an every comment you made about channels. I would very much apprecate if you could try to be little bit more acceptive to ideas that have long history of reaserch, formal mathematical proves & also succesfull deployments in cutting edge languages like go & rust.

Channels undenibly have a simpler API that I put effort into translating to JS. I am happy to get a costructive criticism in order to address specific issues, but for now I have ported all the examples and get some unproductive comments about general idea of using anything other that isn't coming from node.

P.S.: The definition of streams I quoted isn't specific to FRP, CSP or any othe reaserch, it general steam definition you can find on wikipedia http://en.m.wikipedia.org/wiki/Stream_(computing)

On Sunday, April 20, 2014, Domenic Denicola notifications@github.com wrote:

Ah! It seems we have come to understand the fundamental reason why your channels are so different from these streams.

When the people involved in this repo, including implementers looking to implement it, use the word "stream" it is not a concurrency primitive. It does not mean a sequence of units over time. They are _not_primitives for data transfer between different components of the application. And they are definitely not about synchronizing between different concurrent tasks (i.e. different threads).

I realize this may be confusing, as a lot of the reactive programming community, as well as the CSP community, uses the word "streams" to describe such things. But that's not what this spec is focused on. We are focused on the meaning of the word "streams" as in Node.js: an I/O primitive, meant to map very closely to underlying parts of the kernel interfaces, and designed simply to smooth over platform-specific differences and native/JavaScript boundary incompatibilities.

I apologize for this confusion having led you down a path of thinking that what we were building had some relationship to CSP channels. But we are really dealing with extremely different things here.

— Reply to this email directly or view it on GitHubhttps://github.com/whatwg/streams/issues/107#issuecomment-40910833 .

Regards

Irakli Gozalishvili Web: http://www.jeditoolkit.com/

domenic commented 10 years ago

I am simply trying to explain to you why this repo is not the appropriate place to be pushing channels. I've done my best, but you seem to consistently be hiding behind the idea that anyone who doesn't accept your ideas is doing so not out of e.g. a genuine mismatch, but instead prejudice.

Gozala commented 10 years ago

I am simply trying to explain to you why this repo is not the appropriate place to be pushing channels. I've done my best, but you seem to consistently be hiding behind the idea that anyone who doesn't accept your ideas is doing so not out of e.g. a genuine mismatch, but instead prejudice.

I believe when you presented streams at the summit you said you were open to discussion & mentioned that you would welcome for someone to take initiative and bring CSP into a picture. That is what I have being trying to do following the steps you've suggested which is providing API proposal along with examples how it would address specific use cases.

In #104 you pointed out some issues & I did addressed each one in the examples file that lives under my fork. If there is some other constructive way I could contribute, please let me know and I am going to try that too. If changed your mind since the summit and have no interest in discussing any alternatives, that's unfortunate, but clear.

But just saying that channels aren't fit for solving issues that current streams are proposed to is just incorrect and I can not accept that, as there are clear proves of opposite.

As of criticism of the current stream API proposal:

  1. Interface surface is massive.
  2. They couple data buffering concerns with transporting.

Side effect of that is that simple combinators aren't anything but simple.

domenic commented 10 years ago

Perhaps you misunderstood what I was saying at the summit. We have a proposal here that has some buy-in from the many involved parties and stakeholders, including implementers---both of browsers, and of server-side JavaScript runtimes. What I was hoping would be that someone would be able to explore how to use the ideas of CSP to evolve it and help the existing proposal which has buy-in. I was not inviting you to come along, say "everything here is wrong," and start from scratch. Starting from scratch is a fine thing to do, on your own time. But it means you need to invest in getting buy-in from all the relevant stakeholders (again, implementers being the most important) yourself. That's an entirely different undertaking, and should not bleed over into the evolutionary work that I was hoping would take place in this repo.

Now, if there are constructive criticisms and solutions you can give for evolving the current proposal, that is welcome in this repo. And to be clear, abstract subjective things like "it has more functionality than I would like" or "it couples things that I think should be separate" or "it isn't based on enough academic papers" or "it doesn't match what Rust and Go are doing" are not constructive criticism. Constructive criticism is finding a use case that implementers are trying to solve, and saying how the current proposal does not solve them. For example, many of the open issues are around things like that---e.g., being able to exploit writev syscalls, or giving a simple way to do transform streams so that UTF8-decoding is straightforward, or allowing off-main-thread piping of streams together. Does CSP have anything to say about writev? If not, maybe it's not operating at the level of abstraction that implementers interested in a stream spec are hoping to work on.

Issues of that sort, and solutions to them that don't start with "let's blow everything up and do my thing instead", are very very welcome. But attempting to start over is an entirely different undertaking, and unfortunately it involves more politics and diplomacy than it does technical arguments. This repo is not the best place for such work.

josh commented 10 years ago

I think @Gozala's original concern ties into #103. That its alot of boilerplate to correctly implement stream piping. Hoping better Transform APIs #20 will actually be part of the platform.

domenic commented 10 years ago

Indeed, note that #20 is tagged with "acknowledged missing feature" and "fix these next" ;)

Gozala commented 10 years ago

My issue is that streams API is massive and it mixes two different concerns of transferring data with buffering it which causes API overflow and less composition flexibility. Not to mention that as some point it will get us to a state where we would need something like node domains. I just don't see why choose this over something a lot simpler, but I'm no good with politics so, I think we can just close this issue.