problem with .text() drain behavior and .bodyUsed

wanderview commented 9 years ago

Over here I raised a concern that the .bodyUsed flag was not getting set properly for .text().

https://github.com/yutakahirano/fetch-with-streams/issues/25#issuecomment-89707193

Can someone explain how .text() can not set the .bodyUsed without requiring the browser to cache the entire contents of the stream in case someone calls it again?

And if the answer is "the code will hit the locked reader", it seems we still can't just break the meaning of bodyUsed on a shipped spec.

yutakahirano commented 9 years ago

At there I replied that it was discussed at https://github.com/slightlyoff/ServiceWorker/issues/452. Do you want to discuss it again? From which point?

wanderview commented 9 years ago

I'm sorry, but that thread is so long I'm having trouble determining what the outcome was.

My immediate concern is from this pull request:

https://github.com/w3c/web-platform-tests/pull/1782

The web-platform-test was previously doing:

    return response.text().then(function() {
      assert_false(
        response.bodyUsed,
        '[https://fetch.spec.whatwg.org/#concept-body-consume-body] ' +
          'The text() method should not set "body passed" flag.');
      return cache.put(new Request(test_url), response);
    });

When I went to fix this I was told the fetch-with-streams spec was changing to this behavior:

https://critic.hoppipolla.co.uk/showcomment?chain=11816

I strongly object to this because its a complete reversal of the behavior currently shipping in fetch in both chrome and firefox. Also, it just seems completely broken from a webdev point of view. I just drained the Response but its body isn't used?

I think whatever fetch-with-streams does, there should be no observable difference to code using the existing .text()/.json()/etc in current fetch.

yutakahirano commented 9 years ago

Cc-ing discussion participants: @annevk @domenic @tyoshino @horo-t

It is true that fetch-with-streams draft changes the behavior.

I strongly object to this because its a complete reversal of the behavior currently shipping in fetch in both chrome and firefox. Also, it just seems completely broken from a webdev point of view. I just drained the Response but its body isn't used?

So your opinion is that .bodyUsed should return if (part of) data is consumed by someone, right? Actually the current .bodyUsed in the fetch spec turns on even when text() reads nothing, i.e. text() called on an empty response. I don't like the behavior, at least.

annevk commented 9 years ago

I don't like the behavior, at least.

It makes the API consistent. Also, how can we know upfront that a stream is non-empty?

yutakahirano commented 9 years ago

I don't like the behavior, at least.

It makes the API consistent. Also, how can we know upfront that a stream is non-empty?

It is possible to state that .bodyState is set when locked (temporarily) or non-empty data is read from the stream (permanently).

annevk commented 9 years ago

I'm not sure what that means.

yutakahirano commented 9 years ago

A Response object has an associated used predicate that returns true if one of the following holds:

Any non-empty data has been read from the associated readable stream.
The associated readable stream is locked to a reader.

(Example)
var res1 = new Response();
assert_false(res1.bodyUsed); // body is not used.
res1.text().then((text) =>
  assert_equals(text, '');
  assert_false(res1.bodyUsed);
);

var pipe2 = ...;
// Assume we can create a response from a stream.
var res2 = new Response(pipe2.readable);
assert_false(res2.bodyUsed); // body is not used.
res2.text().then((text) =>
  assert_equals(text, '');
  assert_false(res2.bodyUsed); // We didn't modify the body.
);
assert_true(res2.bodyUsed); // body is locked.
pipe2.writable.close();

var pipe3 = ...;
var res3 = new Response(pipe3.readable);
assert_false(res3.bodyUsed); // body is not used.
res3.text().then((text) =>
  assert_equals(text, 'hello');
  assert_true(res3.bodyUsed); // We read data from the body.
);
assert_true(res3.bodyUsed); // body is locked.
pipe3.write('hello');
pipe3.close();

yutakahirano commented 9 years ago

It makes the API consistent.

Additionally, it is inconsistent with Request construction.

var req = new Request('http://www.example.com/');
assert_false(req.bodyUsed);
// This doesn't set |req.bodyUsed| because req's body is null.
var req2 = new Request(req);
assert_false(req.bodyUsed);
req.text().then(function() {
  // It's strange that calling text() sets bodyUsed.
  assert_true(req.bodyUsed);
});

annevk commented 9 years ago

Fair, in the case of a null body (rather than empty body) we should maybe make text() et al always succeed.

wanderview commented 9 years ago

(Sorry for my slow response. I'm on PTO.)

Happy to see .text() on null body always succeed and not set .bodyUsed. That's a special case that we already exempt in the constructor (as you noted).

So your opinion is that .bodyUsed should return if (part of) data is consumed by someone, right?

How can .text() be considered only reading part of the data? By definition .text() reads the entire body from the Request/Response. Changing it to mean anything else is not backward compatible and will break existing code.

I personally prefer setting .bodyUsed as soon as the first byte is read via .body.getReader() as well, but I'm willing to bend on that one since you guys don't agree. I think its creating a footgun that will result in partial values stored in Cache API, etc, but I won't argue it further.

It is true that fetch-with-streams draft changes the behavior.

In general I can't see any reason why adding a stream API requires breaking backward compat on the existing API.

And I have to say I'm incredibly nervous now that you are shipping this API on a release channel in a number of weeks. This breaking change was not mentioned in your "compatibility risks" in your intent to ship:

https://groups.google.com/a/chromium.org/forum/#!searchin/blink-dev/fetch$20stream/blink-dev/35_QSL1ABTY/LMKZ9nTOaf8J

Are there other non-backward compatible changes to the Fetch API in the next release? Based on my review in #25 I think the answer is "no", but it would be nice to have confirmation.

Anyway, sorry to be all worked up over this.

wanderview commented 9 years ago

Also, I just want to say I'm optimistic we can sort this out. I'm sorry if my last comment came across a bit negative. I know its hard to do this kind of spec work and I really appreciate you guys taking it on.

yutakahirano commented 9 years ago

So your opinion is that .bodyUsed should return if (part of) data is consumed by someone, right?

How can .text() be considered only reading part of the data? By definition .text() reads the entire body from the Request/Response. Changing it to mean anything else is not backward compatible and will break existing code.

May be, or may not be. Without the streaming API, there is no way to read data partially, so calling text() consumes all data. With the streaming API, I think either are possible - consuming all data, or consuming remaining data. Note that I'm not opposing to the former option.

I personally prefer setting .bodyUsed as soon as the first byte is read via .body.getReader() as well, but I'm willing to bend on that one since you guys don't agree. I think its creating a footgun that will result in partial values stored in Cache API, etc, but I won't argue it further.

I think you are fine with https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-100126305, right?

yutakahirano commented 9 years ago

I would like to set .bodyUsed when locked, respecting Anne's idea.

When we adopt the idea that .bodyUsed is set when non-empty data is read (or locked), It looks reasonable to me that reading from an empty response doesn't set .bodyUsed.

tyoshino commented 9 years ago

Let's avoid confusion around "set .bodyUsed". E.g. include "permanently" if it means making .bodyUsed return true forever, and include "while" if it means making it return true only while some condition is met (for example, force .bodyUsed return true while the stream is locked).

yutakahirano commented 9 years ago

@tyoshino Thanks, here I rephrase https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-100759289.

I would like to set .bodyUsed at least temporarily when locked, respecting Anne's idea.

When we adopt the idea that .bodyUsed is set permanently when non-empty data is read and set temporarily when locked, It looks reasonable to me that reading from an empty response doesn't set .bodyUsed.

tyoshino commented 9 years ago

@wanderview Regarding

I personally prefer setting .bodyUsed as soon as the first byte is read via .body.getReader() as well,

@domenic said that a stream that has been partially (not fully) consumed using the Streams API should still be able to be text()-ed in https://github.com/slightlyoff/ServiceWorker/issues/452#issuecomment-70718250. You should convince Domenic.

tyoshino commented 9 years ago

IIRC, we were trying to re-define text(), json(), etc. using the ReadableStream when we were introducing the ReadableStream to Request/Response, while we treated the other operations such as cache.put() as special and didn't try to re-define them using the ReadableStream. We paid attention to only the state of the ReadableStream to define reasonable precondition to text(), json(), etc., and reasoned that a Request/Response with an empty stream (was originally non-empty but made to be empty by text(), json(), etc.) should be able to accept text(), json(), etc. as well as we want to allow text(), json(), etc. on a Request/Response constructed and initialized to be empty. See https://github.com/slightlyoff/ServiceWorker/issues/452#issuecomment-71785894

wanderview commented 9 years ago

May be, or may not be. Without the streaming API, there is no way to read data partially, so calling text() consumes all data. With the streaming API, I think either are possible - consuming all data, or consuming remaining data. Note that I'm not opposing to the former option.

"Remaining data" is fine with me. It still implies EOF and bodyUsed should be true after .text() resolves, though.

I think you are fine with #37 (comment), right?

Its unclear to me if pipe2 is correct. The body is not technically null there. Its an empty stream that should hit EOF immediately. In that case it seems to me that bodyUsed should be true.

Let's avoid confusion around "set .bodyUsed". E.g. include "permanently" if it means making .bodyUsed return true forever, and include "while" if it means making it return true only while some condition is met (for example, force .bodyUsed return true while the stream is locked).

I think bodyUsed should never revert from true back to false. IMO it should mean EOF.

@domenic said that a stream that has been partially (not fully) consumed using the Streams API should still be able to be text()-ed in slightlyoff/ServiceWorker#452 (comment). You should convince Domenic.

Yea. I gave up on that one.

yutakahirano commented 9 years ago

I will reply to other points later, but I would point a couple of things:

I think bodyUsed should never revert from true back to false. IMO it should mean EOF.

This contradicts with the idea I mentioned at https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-100764492.

var res = new Response('hello');
assert_false(res.bodyUsed); // body is not used.
var reader = res.body.getReader();
// (A) Should res.bodyUsed be true here?
// We didn't read anything!
reader.releaseLock();
// (B) Should res.bodyUsed be true here?

@wanderview , Let me confirm again: You prefer false for both (A) and (B)? If so, a developer should check if the body is locked in addition to check bodyUsed by adding a try-catch and that bodyUsed looks useless by itself.

A possible fix is to define bodyLocked separately.

Regarding differentiating empty and null body, it's troublesome because streams doesn't have such distinction.

// This is an empty response!
var res = new Response('');
// Here res.body is already CLOSED.
assert_false(res.bodyUsed); // body is not used.
// Note that acquiring reader actually does nothing in the current stream spec.
var reader = res.body.getReader();
// (C) Should res.bodyUsed be true here?
reader.releaseLock();
// (D) Should res.bodyUsed be true here?

reader = res.body.getReader();
// (E) Should res.bodyUsed be true here?
// |reader| is already released (because |res.body| is closed) and one cannot notify
// anything with a released reader to an underlying source.
reader.read().then(r => {
  // Nothing is actually read
  assert_true(r.done);
  // (F) Should res.bodyUsed be true here?
});

wanderview commented 9 years ago

var res = new Response('hello');
assert_false(res.bodyUsed); // body is not used.
var reader = res.body.getReader();
// (A) Should res.bodyUsed be true here?
// We didn't read anything!
reader.releaseLock();
// (B) Should res.bodyUsed be true here?
@wanderview , Let me confirm again: You prefer false for both (A) and (B)? If so, a developer should check if the body is locked in addition to check bodyUsed by adding a try-catch and that bodyUsed looks useless by itself.

Ok, I see what you mean here.

I think this shows the awkwardness of trying to equate the stream lock to the bodyUsed attribute, though. I find it terribly confusing to use an attribute that says "the body has been drained and is used" to mean "the body is temporarily unavailable".

Can we make bodyUsed mean stream EOF and just make .text()/etc reject if the body is locked?

It would be nice if the ReadableStream had a .locked attribute to expose this state without having to use a try/catch. Then code could check response.body.locked to detect this temporary unavailable condition.

wanderview commented 9 years ago

A possible fix is to define bodyLocked separately.

Oh, I missed this sentence. Yes, I think that would make sense. I think it would be nicer to make it body.locked as an attribute on ReadableStream, though.

Regarding differentiating empty and null body, it's troublesome because streams doesn't have such distinction.

I guess we can have an internal flag for null body. If there is no body at all provided in the Constructor then its set to true. If a stream is provided, but hits EOF without reading any actual data then its set to true. Then bodyUsed becomes !nullBody && streamEOF.

Is there a clean way for Request/Response to detect if any data is actually consumed from the stream?

domenic commented 9 years ago

It would be nice if the ReadableStream had a .locked attribute to expose this state without having to use a try/catch.

This seems fine to me; not a problem to add.

I guess we can have an internal flag for null body.

I think it would be better to get rid of this distinction between null and empty body ... I do not understand what it adds.

wanderview commented 9 years ago

I think it would be better to get rid of this distinction between null and empty body ... I do not understand what it adds.

I think we need some kind of internal concept for it to prevent things like Cache trying to get the stream when its already EOF'd with no-data-read. Cache and other APIS need something to check to short-circuit on.

yutakahirano commented 9 years ago

OK, let's provide bodyLocked on Response or Readable[Byte]Stream. I'm not sure which is better.

I think we need some kind of internal concept for it to prevent things like Cache trying to get the stream when its already EOF'd with no-data-read. Cache and other APIS need something to check to short-circuit on.

From Streams point of view, one can do nothing with a CLOSED stream. It cannot be locked, non-empty data cannot be read from it and it cannot be errored. Hence I cannot find any point at which we can set bodyUsed. In the latter example in https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-102290842, bodyUsed was not set initially. In order to see true at (F), we should choose when to set it from (C), (D), (E) and (F). None of them looks reasonable to me.

As a fetch-streams user, I still don't understand why null vs. empty distinction is important in this use-case (I'm not saying that the distinction is generally useless).

wanderview commented 9 years ago

OK, let's provide bodyLocked on Response or Readable[Byte]Stream. I'm not sure which is better.

I would really prefer Response.body.locked. I don't see why detecting the lock state is a fetch-specific thing. Every stream instance would benefit from this.

From Streams point of view, one can do nothing with a CLOSED stream. It cannot be locked, non-empty data cannot be read from it and it cannot be errored. Hence I cannot find any point at which we can set bodyUsed. In the latter example in #37 (comment), bodyUsed was not set initially. In order to see true at (F), we should choose when to set it from (C), (D), (E) and (F). None of them looks reasonable to me.

I'm not asking to set bodyUsed to true at (F).

I'm saying the Cache API needs some way to avoid an error when doing cache.put() with a closed stream, but where no data was read and bodyUsed is false. In this case Cache API cannot just try to read from the stream because it will error on either the lock or closed state. Cache API needs some flag to detect that we're doing the don't-error-on-empty-stream optimization.

domenic commented 9 years ago

I'm saying the Cache API needs some way to avoid an error when doing cache.put() with a closed stream, but where no data was read and bodyUsed is false.

Why?

In this case Cache API cannot just try to read from the stream because it will error on either the lock or closed state.

If it is locked it will error (and that seems like a good thing). If it is closed it will just get { done: true, value: undefined }.

wanderview commented 9 years ago

If it is locked it will error (and that seems like a good thing). If it is closed it will just get { done: true, value: undefined }.

I was thinking multiple reads on closed stream would reject.

I guess it will just work if:

1) Response/Request don't set bodyUsed when the stream hits EOF without any data being read. 2) .text(), cache API, etc unlock the stream after they get the done:true signal.

So you're proposing just having the Request and Response constructors create an empty body stream in the case of no body being provided?

domenic commented 9 years ago

2) .text(), cache API, etc unlock the stream after they get the done:true signal.

This happens automatically. As @yutakahirano says once a stream has been read-to-end (aka closed) it is automatically unlocked.

So you're proposing just having the Request and Response constructors create an empty body stream in the case of no body being provided?

I think so. I'm still a little confused at what the goal of bodyUsed is these days but mainly just trying to chime in to make sure the model stays coherent. This null vs. empty body things feels incoherent.

wanderview commented 9 years ago

I'm still a little confused at what the goal of bodyUsed is these days but mainly just trying to chime in to make sure the model stays coherent.

Well, go look at how "body used" is referenced in these specs:

https://slightlyoff.github.io/ServiceWorker/spec/service_worker/index.html https://fetch.spec.whatwg.org/#concept-body-used-flag

Its essentially a safety-belt to prevent script from storing a drained Response in Cache or uploading a drained Request to a fetch PUT, etc.

The previously proposed changes of unsetting bodyUsed when the stream lock was dropped broke all that. It would have also broke client code checking bodyUsed.

Maintaining the definition of bodyUsed as "body EOF and body was not an empty stream" keeps all that working.

If the Request/Response can detect the "body was an empty stream" condition and set bodyUsed appropriately, then I'm ok with always setting a body stream. Using null was just a convenient way of doing that previously.

yutakahirano commented 9 years ago

I'm not asking to set bodyUsed to true at (F).

I'm saying the Cache API needs some way to avoid an error when doing cache.put() with a closed stream, but where no data was read and bodyUsed is false. In this case Cache API cannot just try to read from the stream because it will error on either the lock or closed state. Cache API needs some flag to detect that we're doing the don't-error-on-empty-stream optimization.

Its essentially a safety-belt to prevent script from storing a drained Response in Cache or uploading a drained Request to a fetch PUT, etc.

Thanks for the clarification. But "a drained Response" can still be interpreted in multiple ways, I think.

Currently I have two ideas which look reasonable to me.

1: bodyUsed is set when the body stream becomes closed if the stream was not closed when the Request / Response was constructed.

In this option, bodyUsed will never be set for a response whose stream has been closed from the beginning (of the response).

In the following example, res.bodyUsed will never be set.

var stream = ...;
// |stream| is readable here.
consume(stream).then(() => {
  // |stream| is closed here.
  var res = new Response(stream);
  res.text().then(() => {
    // Here |res.bodyUsed| is false because the body stream has been closed
    // from the beginning.
  });
});

On the other hand, if you read data after constructing a Response, res.bodyUsed will be set. Note that res.bodyUsed will be set even if the stream is empty as a result.

var stream = ...;
// |stream| is readable here.
var res = new Response(stream);
consume(stream).then(() => {
  // |stream| is closed here.
  // Here |res.bodyUsed| is true because the body stream is closed but was readable
  // when |res| was constructed.
});

A fetch response's body stream is not closed when the response is constructed.

fetch(req).then(res => {
  return res.text().then(text => {
    // Here |res.bodyUsed| is true even if |text| is empty, because the body stream
    // is closed but was readable.
  })
})

2: bodyUsed is set if any non-empty data is read from the body stream.

This is bodyConsumed discussed in https://github.com/slightlyoff/ServiceWorker/issues/452. This makes calling functions such as text() and cache.put() invalid after reading data partially, as @tyoshino mentioned.

tyoshino commented 9 years ago

@wanderview

Well, go look at how "body used" is referenced in these specs:

I think you meant used flag.

I guess used flag has been there to prohibit concurrent reading on body (a byte stream as currently defined in the Fetch spec). Once we replace it with the ReadableStream defined in the Streams spec, ReadableStream itself can indicate whether the stream is available for consumption or not, but current body doesn't have any meta information about its state (e.g. someone is consuming it now or not).

So, I guess the main motivation of used flag was actually like what the lock of the Streams API provides.

And, in addition to prohibiting concurrent reading, I guess the community chose to prohibit reading also after the consumption of body is finished because either just (1) or both (1) and (2) below:

rejecting operations on a Request whose body has been modified (non-empty to empty, or non-empty to non-empty but less bytes) is not harmful and is good to educate developers that they shouldn't expect to obtain the body bytes anymore.
storing Request/Response with modified body data could lead to some security issue

Ben, am I understanding your point?

tyoshino commented 9 years ago

(cont'd)

I guess the community was not interested in the state of body after consuming bytes from it. I understand that it's reasonable that they skipped discussing this point much.

But the Streams API clearly defined a state of a stream indicating "after full consumption", and we're noticing that the state is also used for representing an initially empty body.

Those who involved in development of the Streams API including me should step back a bit and investigate carefully if the points (1) and (2) in my last post are meaningful or not.

wanderview commented 9 years ago

I think you meant used flag.

Yes, sorry.

I guess used flag has been there to prohibit concurrent reading on body (a byte stream as currently defined in the Fetch spec).

So, I guess the main motivation of used flag was actually like what the lock of the Streams API provides.

Yes, but I don't think that's all it does. Its also there to enforce the concept that "this Response object fully represents the network Response". If the body has been drained, and can no longer be read, then the Response no longer represents that concept.

So checking the used flag is there to prevent the footgun of operating on these objects when the body matters (storing, posting to fetch, etc).

I think this matches to what you said in (1) below:

And, in addition to prohibiting concurrent reading, I guess the community chose to prohibit reading also after the consumption of body is finished because either just (1) or both (1) and (2) below:

rejecting operations on a Request whose body has been modified (non-empty to empty, or non-empty to non-empty but less bytes) is not harmful and is good to educate developers that they shouldn't expect to obtain the body bytes anymore.

storing Request/Response with modified body data could lead to some security issue

I'm not aware of security issues. I think it was more the reason I gave above.

But the Streams API clearly defined a state of a stream indicating "after full consumption", and we're noticing that the state is also used for representing an initially empty body.

Is this actually the same? I guess I was assuming there was a case where an empty stream could be open and it only went to the closed state after the first attempted read.

Anyway, I think we were pretty close to a solution.

moving the locked concept out of bodyUsed into a separate attribute helps keep things backward compatible.
Specs like Cache and fetch probably need to check "used flag or locked" where they currently check just used flag. This could be wrapped up in a "unavailable" spec concept that could be linked to.
We just need some way to trigger setting bodyUsed in fetch spec. It seems we're halfway there with the body.closed promise. We just need some way of telling that data was ever copied. Some inspectable chunk count or something would work.

Thoughts?

tyoshino commented 9 years ago

I'm not aware of security issues. I think it was more the reason I gave above.

OK. It was just my guess. Never mind.

Is this actually the same? I guess I was assuming there was a case where an empty stream could be open

Right. ReadableStream's "readable" state means that there can be more data read from the stream. So, yes, the Streams API has introduced a few more states.

and it only went to the closed state after the first attempted read.

ReadableStream can transition to "closed" state without any attempt to read. "readable" state just means it's unknown yet if there are or will be more data to read or not. As soon as the underlying source API tells the ReadableStream that it's not going to generate any more bytes, rs.closed fulfills. So, it's not a good place to represent "drained" semantics.

tyoshino commented 9 years ago

Anyway, I think we were pretty close to a solution.

Added option (X) to the doc I used in https://github.com/slightlyoff/ServiceWorker/issues/452. https://github.com/tyoshino/streams_integration/blob/master/FetchBodyPrecondition.md

Not it looks bodyUsed is kinda legacy attribute that is designed in "Before Stream" era though I'm ok with it.

tyoshino commented 9 years ago

Yes, but I don't think that's all it does. Its also there to enforce the concept that "this Response object fully represents the network Response". If the body has been drained, and can no longer be read, then the Response no longer represents that concept.

So checking the used flag is there to prevent the footgun of operating on these objects when the body matters (storing, posting to fetch, etc).

In the option (X), your concerns are addressed for the users of .text(), json(), etc.

I don't think we need to take care of those who are using .body through ReadableStream interface directly, (not .text(), etc.). They should be aware of what they're doing, and I think that they're using .text(), etc. after manipulating the stream through ReadableStream implies that they want to store the modified body, post the modified body to fetch, etc.

I hope the option (X) suffices.

wanderview commented 9 years ago

In the option (X), your concerns are addressed for the users of .text(), json(), etc.

I don't think we need to take care of those who are using .body through ReadableStream interface directly, (not .text(), etc.). They should be aware of what they're doing, and I think that they're using .text(), etc. after manipulating the stream through ReadableStream implies that they want to store the modified body, post the modified body to fetch, etc.

I hope the option (X) suffices.

I'm sorry, but I think consumers like Cache API should behave the same whether the Response was drained via .text() or via .body.getReader().read(). Regardless of how the body was drained, cache.put() should reject.

For example, it would be nice to be able to change the Cache API spec to be written in terms of the stream API. So it would be written as calling body.getReader(), etc. Right now the Cache API spec is a bit handwavy about how the data is actually consumed. It ignores the async aspects completely.

This only really works if the "used flag" concept is triggered from normal stream reading.

Of course all of this would be easier without the "don't treat empty streams as used" optimization.

Actually, would it make sense to expose the empty stream concept on streams API? It seems others might want to use it as well. It could be .body.empty which evaluates to true if state == closed and a chunk was never read.

domenic commented 9 years ago

For example, it would be nice to be able to change the Cache API spec to be written in terms of the stream API. So it would be written as calling body.getReader(), etc. Right now the Cache API spec is a bit handwavy about how the data is actually consumed. It ignores the async aspects completely.

Strong agreement. Been digging through the Blink fetch code for the last few days and the lack of layering is painful.

Actually, would it make sense to expose the empty stream concept on streams API? It seems others might want to use it as well. It could be .body.empty which evaluates to true if state == closed and a chunk was never read.

I don't really think this makes much sense. If I give you a stream I've read all the chunks from (and thus happens to be closed), that should be the same as some random stream that was never read from but is also closed (because nothing was ever enqueued into it). They behave the same in every respect so there's no point in introducing something new to differentiate them.

I'll try to step back and comprehend what's going on here a bit better ... it seems like it's gotten messy.

tyoshino commented 9 years ago

For example, it would be nice to be able to change the Cache API spec to be written in terms of the stream API. So it would be written as calling body.getReader(), etc. Right now the Cache API spec is a bit handwavy about how the data is actually consumed. It ignores the async aspects completely.

cache.put() may use body.getReader() even after it sets used flag. The option (X) allows us to choose from either "set used flag" or "just lock the stream" appropriately for each use case. cache.put() can choose to set used flag.

tyoshino commented 9 years ago

I think consumers like Cache API should behave the same whether the Response was drained via .text() or via .body.getReader().read(). Regardless of how the body was drained, cache.put() should reject.

I want to understand more about the background behind this idea. https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-103560055 is my opinion about (1) in https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-103378404.

I reread https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-103525502

Its also there to enforce the concept that "this Response object fully represents the network Response". If the body has been drained, and can no longer be read, then the Response no longer represents that concept.

So checking the used flag is there to prevent the footgun of operating on these objects when the body matters (storing, posting to fetch, etc).

I can't quite get what you want to mean in these comments beyond (1) in https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-103378404. If not security issue (https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-103378404), then ... what's under question?

wanderview commented 9 years ago

cache.put() may use body.getReader() even after it sets used flag. The option (X) allows us to choose from either "set used flag" or "just lock the stream" appropriately for each use case.

I don't understand what you mean here. It was my understanding that consumers normally unlocked the reader when they hit the end. I don't think we can have consumers like cache.put() depend on other consumers leaving the reader locked.

cache.put() can choose to set used flag.

But what about any other consumer of the body stream before the cache.put(). Everyone who ever touches the stream would have to manually set the used flag for this to work. That seems impractical.

tyoshino commented 9 years ago

I don't understand what you mean here. It was my understanding that consumers normally unlocked the reader when they hit the end. I don't think we can have consumers like cache.put() depend on other consumers leaving the reader locked.

Sorry for being unclear. "it" in my comment meant "cache.put() algorithm".

cache.put() should do:

set used flag on req and res
then, get a reader to read from the stream

wanderview commented 9 years ago

I can't quite get what you want to mean in these comments beyond (1) in #37 (comment). If not security issue (#37 (comment)), then ... what's under question?

On the face of it this looks like reasonable code:

var response = new Response(url, { body: 'hello world' });
cache.put(fakeURL1, response);
cache.put(fakeURL2, response);

If cache.put() does not reject on drained responses, then the response for fakeURL2 will be missing the 'hello world' data and its not obvious at all.

I'm using cache.put() for the first consumer, but it could be anything else that might drain the body.

wanderview commented 9 years ago

cache.put() should do:

set used flag on req and res

then, get reader to read from the stream

This is reasonable, but cache.put() can't really on all other possible consumers of the Response body to do the same. So the reading code can't rely on this approach.

tyoshino commented 9 years ago

But what about any other consumer of the body stream before the cache.put(). Everyone who ever touches the stream would have to manually set the used flag for this to work. That seems impractical.

Right. But I'm currently thinking the streams is not the right place to place the boolean information. So, operations that are considered to be making the Request used are setting used flag manually in the plan (X).

In past on some bug, I suggested changing the semantics of the Streams API to include "drained" semantics. But IIRC domenic said he's feeling it's not good. I'm open to reopening it if there's strong reason.

domenic commented 9 years ago

OK, in IRC wanderview explained things to me a bit more. In particular he gave the example of being able to re-use a request multiple times for GET:

var req = new Request(url);
fetch(req);
fetch(req);
cache.add(req);

Assuming that this is desirable here is my suggestion:

If Request/Response is initialized without a body, .body returns null. (Users should know not to use it, just like users know it is safe to do multi-fetch.)
If Request/Response is initialized with an empty body, .body returns a closed readable stream.
If Request/Response is initialized with a non-empty body, .body returns a readable readable stream.
cache.put and fetch will:
- Ignore the body if it is null. You can use these to consume such a null-body Request/Response, like new Request(url), multiple times.
- Try to acquire a reader for the body. (If it is already locked, they fail of course; you cannot use them multiple times in parallel on any non-null body request/response.)
- If reading from the reader gives back { closed: true, value: ... } as the first result, also fail. This means you cannot use these multiple times in sequence any non-null body request/response.

With this I do not see any use for a separate used flag or body used flag. To support the legacy .bodyUsed attribute on top of this, I am not sure what we should do, but personally I would remove it.

tyoshino commented 9 years ago

Maybe I've answered to https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-103581458 by https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-103581066

yutakahirano commented 9 years ago

Just FYI: In the current fetch spec we can reuse a request.

var req = new Request(url, {body: 'hello'});
fetch(req);
fetch(req);

The second fetch sends a request for url with null body because in the first fetch req.body will be set to null and hence bodyUsed is not checked in the second fetch. I don't know if it is intentional.

tyoshino commented 9 years ago

Re: https://github.com/yutakahirano/fetch-with-streams/issues/37#issuecomment-103582422

Hmm, so, you're worried about any new operation that's not under our control may touch .body, get a reader and make something that should be treated as used and prevent cache.put()?

tyoshino commented 9 years ago

Re: Yutaka, wow, right. Request(input, init) constructor substeps of the step 2 won't be run if body is null. Seems a bug.

yutakahirano / fetch-with-streams

problem with .text() drain behavior and .bodyUsed #37