domenic commented 8 years ago

I'd like to use this meta-issue to track any ideas for a version of writable streams that has some sort of buffer-reuse semantics, like our BYOB readable streams.

465 has some previous discussion.

One thing I noticed while writing an example is that it's possible for an underlying sink to always transfer away a buffer. This is probably what most high-performance binary data underlying sinks will do, actually. (It's either transfer, copy, or expose raciness to the web.) But this has two drawbacks:

The underlying sink doesn't see the buffer until the other buffers in the queue ahead of it have been consumed. So it doesn't get detached immediately. So there is a bit of an async race (not a multithreaded one) between any code that manipulates the buffer, and the underlying sink.
The underlying sink has no opportunity to give the buffer back to the producer if they want to reuse it.

Here is a real example of this. Assume writableStream is a stream whose underlying sink transfers any chunks given to it (and errors if not given Uint8Arrays). Then consider this code:

  async function writeRandomBytesForever(writableStream) {
    const writer = writableStream.getWriter();

    while (true) {
      if (Math.random() > 0.5) {
        await writer.ready;
      }

      const bytes = new Uint8Array(1024);
      window.crypto.getRandomValues(bytes);

      const promise = writer.write(bytes);
      doStuffWith(bytes);
      await promise;
    }
  }

Here the two problems are illustrated:

doStuffWith(bytes) can maybe modify bytes, or maybe not. (If we did await writer.ready, the queue will possibly be empty, in which case you cannot modify, since the underlying sink already transferred it. But if we did not, then you can modify.)
We have to reallocate new typed arrays every time through the loop; we cannot reuse the same one.

ricea commented 8 years ago

How it works for non-blocking TCP is that userspace asks to write N bytes, and the kernel responds synchronously that it managed to write M bytes, where M <= N. Because it's synchronous, there's no risk that the buffer will be modified while the write happens (unless threads are involved, and then all bets are off). And once the write call returns, userspace is free to reuse the buffer any way it likes.

I'm hesitant to introduce synchronous APIs to Javascript, but I think it can work if a synchronous copy is done when the sink is unable to do synchronous writes.

When specifying standards involving writable streams, it would have to be unspecified whether the underlying sink was sync-capable or not, because different browser architectures would have different constraints. For example, I can imagine that Chrome might have streams that were sync-capable in the main thread but not in workers.

This could be a problem for predictability. If the underlying sink wasn't sync-capable then it might be more performant not to use keep-your-own-buffer semantics.

tyoshino commented 8 years ago

The underlying sink doesn't see the buffer until the other buffers in the queue ahead of it have been consumed. So it doesn't get detached immediately. So there is a bit of an async race (not a multithreaded one) between any code that manipulates the buffer, and the underlying sink.

For the buffer returning variant of WritableStream, write() should detach the given buffer immediately I think.

The underlying sink has no opportunity to give the buffer back to the producer if they want to reuse it.

Did you mean the time until resolution of the returned promise? I understood so as the example you provided is explaining such a scenario.

We have to reallocate new typed arrays every time through the loop; we cannot reuse the same one.

Good point. tzik also pointed out that if we're using a big ArrayBuffer e.g. for Web Assembly, always transferring a given ArrayBuffer would be very problematic, before. Sorry for not creating an issue for discussing that earlier.

The BYOB readable stream also has the problem. People may want to issue a new async BYOB read on region X and then do some stuff on the other region containing unprocessed contents.

@ricea's solution is simple and good.

I think this kind of usage needs some evolution on the ArrayBuffer API side, e.g. partial detaching. For example, we could introduce a new API like the following:

UniqueViewArrayBuffer is never transferred and only UniqueView (normal TypedArray is not available for it) can be created for UniqueViewArrayBuffer [EDITED].
a new type of TypedArray named UniqueView lets the code to manipulate contents of the associated UniqueViewArrayBuffer.
UniqueViewArrayBuffer can also be backed by SharedArrayBuffer.
UniqueView can be transferred
There can be only one UniqueView for each byte in a UniqueViewArrayBuffer which covers the byte [EDITED].
uniqueView.revoke() revokes the view [EDITED].

domenic commented 8 years ago

Did you mean the time until resolution of the returned promise? I understood so as the example you provided is explaining such a scenario.

What I meant is that you could imagine a design where write() fulfills with (a transferred version of) the buffer it was given, instead of with undefined. That would allow the example to keep reusing the same buffer instead of repeatedly allocating it.

There might be other designs, but yeah, in general I think that after the promise is fulfilled, the buffer should be given back, at least in some cases.

I think this ties in to what @ricea points out about sync vs. async by just normalizing everything to async... but I'm not sure.

Good point. tzik also pointed out that if we're using a big ArrayBuffer e.g. for Web Assembly, always transferring a given ArrayBuffer would be very problematic, before. Sorry for not creating an issue for discussing that earlier.

Yeah. This might be a separate issue, but in general I agree that all this design would have been nicer if we could have partial detaching in the ArrayBuffer API. I proposed that in https://esdiscuss.org/topic/improving-detachment-for-array-buffers but didn't get any interest. I could try harder to make that work, maybe. Although I think that people are probably more excited about SharedArrayBuffer for now.

Certainly if we allow streams to work with SharedArrayBuffer, we can get nice semantics where there are no transfers (but also things are potentially racy if you don't wait until the right signal before reusing the passed-in sections of the buffer). Maybe that is the way forward for this problem.

tyoshino commented 8 years ago

Oh, I should have just learn SharedArrayBuffer design more in depth. Thanks for the pointer.

isonmad commented 8 years ago

What about a Reverse BYOB Writer? One that makes a zero-copy identity TransformStream trivial to implement, once we add type: "bytes" option to the TransformStream constructor (and to WritableStream)?

Something like

writable.getBufferWriter({
    writeInto(buffer) {
      return promiseFulfilledWithNumberOfBytesWrittenAfterAsynchronouslyWritingBytesInto(buffer);
    }
  });

ricea commented 8 years ago

@isonmad We need a four-letter acronym for your suggestion, to go with BYOB and KYOB (Keep Your Own Buffer). I propose PUTB (Please Use This Buffer). I guess RBYOB also works.

It's appealing, but it doesn't resemble any low-level IO primitive I remember seeing.

Are there use-cases for zero-copy transform streams? Every useful stream I can think of has to at least read and write every byte, at which point zero-copy doesn't gain you much.

tyoshino commented 8 years ago

Yeah, we discussed that as beginWrite() before in https://github.com/whatwg/streams/issues/329#issuecomment-106711552 which returns a promise that gets fulfilled by a buffer to write into. But passing a callback like isonmad's suggestion is also an option and simpler as we don't have to provide an additional method to take the written buffer.

One issue with returning only the number of bytes written is that it doesn't work if the buffer is transferred. That's why we have respondWithNewView() on the ReadableStreamBYOBRequest.

tyoshino commented 8 years ago

Are there use-cases for zero-copy transform streams? Every useful stream I can think of has to at least read and write every byte, at which point zero-copy doesn't gain you much.

One example I could come up with is some kind of crypto which XORs given byte stream without changing the length. The consumer want to read encrypted data. It passes buffer and the producer writes plain text into the zero-copy passed buffer. Then, the encrypting transform stream would apply the encryption and return it back to the consumer.

isonmad commented 8 years ago

XORing would still be reading and writing every byte, wouldn't it?

But there could also be a filter transform which simply discards certain ranges of the stream (by passing the buffer back to the writer to overwrite that same buffer again). Or the reverse, inserting its own bytes at certain places without modifying the rest.

tyoshino commented 8 years ago

XORing would still be reading and writing every byte, wouldn't it?

Ah, yeah. I was thinking about benefit of:

(1) no-copy between different memory regions (read from address X and write the result to the same address X)
(2) no extra allocation of memory (only the buffer provided to read() is passed around)

But yeah, what ricea was saying was about use case without any read/write operation. My bad.

So, the XOR example is about usefulness of being able to pass through a given buffer. We can realize this even with the normal non-byob read()/write() (the buffer given to write() is read from write()). The beginWrite / getBufferWriter enables this also for BYOB reader.

Maybe only (2) is the gain. Even when we use SIMD instruction, (1) wouldn't be so big given the encrypt example would be sequential operation where operation on the same region or two regions wouldn't make so much difference in memory locality.

tyoshino commented 8 years ago

discarding ... inserting

Yeah. Sounds no-operation on some memory region is the key.

isonmad commented 8 years ago

We need a four-letter acronym for your suggestion, to go with BYOB and KYOB (Keep Your Own Buffer). I propose PUTB (Please Use This Buffer). I guess RBYOB also works.

That still leaves Reverse KYOB, aka the reader side. Please Return This Buffer (PRTB)?

readable.getReader({mode:"PRTB", readThis(buffer) { return promiseWhenDoneReading(buffer); }});

Something in the with-ack role from https://github.com/whatwg/streams/issues/329#issuecomment-106713272 anyway.

tyoshino commented 8 years ago

Regarding (1), I also wondered if there's performance difference between XORPS and VXORPS. Maybe not though I'm not so familiar with SSE/AXV details.

tyoshino commented 8 years ago

Please Return This Buffer (PRTB)

Good, but I'd omit please as BYOB doesn't have it than adding it to make it four-letter.

I also had some ideas. https://github.com/whatwg/streams/issues/329#issuecomment-106780286. Buffer Providing and maybe Buffer Vending, Buffer Lending. I think these are descriptive but aren't well aligned with the naming style of BYOB. That said, as these new ideas are not well known idiom compared to BYOB in English, right? If so, it might be ok not to take alignment with BYOB into account.

Other ideas... "Return My Buffer" might correspond well with "Bring your own buffer".

"Return the Buffer When Done", "Write Then Return", "Write To My Buffer" etc. etc.

readable.getReader({mode:"PRTB", readThis(buffer) { return promiseWhenDoneReading(buffer); }});

Nice. buffer can be a transferred one and is a view with (possibly) modified byteLength, right?

isonmad commented 8 years ago

Yeah. Probably want to do the same with getWriter({mode:"RBYOB", writeInto(buffer) {...} too actually, as you pointed out about transferred/detached buffers.

isonmad commented 8 years ago

Thinking about it, there's a potential conflict between allowing detaching in the consumer and enabling zero-copy in certain cases.

What if something like a GrepTransformStream receives an ArrayBuffer containing, say, 3.5 lines of text. It then wants to give the consumer the 1st line and the 3rd, but not the 2nd, or the remainder which hasn't hit a 4th newline yet. They'd just be two views into the same underlying buffer. Detaching either would wreck the other right? What about a way to provide a reader with an array of views, all into the same buffer?

It's a use case BYOB didn't have to worry about, in the other direction.

Actually if the stream needs the 'remainder' at the end of the buffer to concatenate with the next chunk of bytes it receives, I guess it would have to copy at least that small amount.

tyoshino commented 8 years ago

What if something like a GrepTransformStream receives an ArrayBuffer containing, say, 3.5 lines of text. It then wants to give the consumer the 1st line and the 3rd, but not the 2nd, or the remainder which hasn't hit a 4th newline yet. They'd just be two views into the same underlying buffer. Detaching either would wreck the other right? What about a way to provide a reader with an array of views, all into the same buffer?

So, the GrepTransformStream has a writable byte stream (or non-byte stream with ArrayBuffer(View) chunks) for input and non-byte readable stream with ArrayBufferView chunks (each of them would contain a single line which hit the query in the form of a view with the ArrayBuffer written to the writable byte stream as underlying buffer) for output? That's good point, but for this use case, the spec doesn't require the output part to perform detaching for each output chunk separately since it's not a readable byte stream.

const writer = transformStream.writable.getWriter();  // or byte stream
writer.write(arrayBufferContainingMultipleLines);

const reader = transformStream.readable.getReader();
// The promise gets fulfilled with ArrayBufferViews pointing at matched lines
// backed by arrayBufferContainingMultipleLines.
const promise = reader.read();

So, the GrepTransformStream may delay fulfillment of read()s until all the bytes in the current input ArrayBuffer are processed and we're ready to detach the ArrayBuffer and then create ArrayBufferView instances backed with the ArrayBuffer, representing the grep result, and fulfill read()s.

This does delay fulfillment, but is equivalent to your proposal of providing a reader with an array of views, if I'm understanding your proposal correctly?

If not, could you please elaborate the situation?

isonmad commented 7 years ago

Sorry, I was unclear and it took me a while to figure out how to clarify.

Designing a buffer-reusing version of writable streams, KYOB, implies you need a way to signal when the underlyingSink, and any further downstream consumer in the pipe chain, is done with the buffer; and in the grep case, or any similar one, it's desirable to be able to have the same buffer back multiple views for the downstream consumer. A "non-byte readable stream" would have no way to signal when the consumer is done with that array of views, so it's not really the same idea at all. I should've more explicitly mentioned this being a RKYOB/PRTB idea, since that was the point of providing all the views together in an array atomically, so when the promiseWhenDoneReading() fulfills (or whatever way we pick to signal 'done') the backing buffer for all of them can be reused.

I was thinking, given the case of transforms that simply insert/remove ranges of bytes, what API would let them a) return buffers to the original source, b) avoid pointless copying? Current detaching byte streams wouldn't work, and default streams don't have a way to return buffers.

ricea commented 7 years ago

Another option is to have no special optimised byte Writer at all. If you wanted optimal behaviour you would create a byte ReadableStream and pipe that to the byte WritableStream. pipeTo() would do some magic (possibly unspecified?) to minimise allocations and copies.

Okay, "unspecified magic" would violate predictability. Make it "specified magic".

To put it another way, pipeTo() has to be as efficient as we can make it. So the minimum criteria for having a specialised byte Writer should be either that it is easier to use than new ReadableStream({pull() { ... }, type: 'bytes'}).pipeTo(byteWritable), or that it is more efficient.

ricea commented 7 years ago

@isonmad Pertinent to your comment, at the Streams session at BlinkOn 7 we briefly discussed the interaction between Streams and WebAssembly. There was talk of "partially" detaching ArrayBuffers, so that a WebAssembly program could pass off a chunk of its backing storage to a Stream while keeping the rest. If this was implemented it would also be useful for the grep case you are talking about.

If I understand correctly, there are several other use-cases for partially-detaching ArrayBuffers, so maybe it's not as crazy as it sounds.

domenic commented 7 years ago

There was talk of "partially" detaching ArrayBuffers, so that a WebAssembly program could pass off a chunk of its backing storage to a Stream while keeping the rest. If this was implemented it would also be useful for the grep case you are talking about.

If I understand correctly, there are several other use-cases for partially-detaching ArrayBuffers, so maybe it's not as crazy as it sounds.

In general I am a big fan of this concept, but the web standards community has kind of coalesced around SharedArrayBuffers instead, with the idea being that you can build abstractions around them to emulate semantics like partially-detached ABs. An old example is https://gist.github.com/dherman/5463054 by @dherman although I guess that is more a usage sketch without an implementation. (Source blog post: https://blog.mozilla.org/javascript/2015/02/26/the-path-to-parallel-javascript/)

I think an interesting step will be a modification of readable BYOB streams to use SharedArrayBuffers and in particular to avoid the detach-dance, since with SABs we're allowing data races anyway so all the transferring work we do to avoid them is not necessary.

ricea commented 7 years ago

The byte-level access controls proposed by https://gist.github.com/dherman/5463054 are more elegant than our current detaching semantics I think. It would also provide a neat solution for tee(). Yay!

However, without those semantics SharedArrayBuffer just gives us data races. This makes me sad. Data races are like Pandora's Box. Once you have them, you'll never be free of them.

whatwg / streams

Brainstorming a buffer-reusing version of writable streams #495

465 has some previous discussion.