onError(Throwable) vs cancel()

reactive-streams / reactive-streams-jvm

Reactive Streams Specification for the JVM

http://www.reactive-streams.org/

MIT No Attribution

4.79k stars 527 forks source link

onError(Throwable) vs cancel() #271

Closed gregw closed 8 years ago

gregw commented 9 years ago

@viktorklang Bringing this discussion back here from http://cs.oswego.edu/pipermail/concurrency-interest/2015-June/014275.html

I understand your comments about the expense of bidirectional communication and general acknowledgements. However, I also think that leaving the establishment of a back flow for such things as recycling outside the standard is leaving a bit of a hole for interoperability. But let me raise that separately.

For now, I want to understand more about the error handling, specifically in a chain of Processors. For any particular processor, it is possible that something can go wrong and when it does both ends of the RS need to be informed. Currently the API allows for errors to be propagated by calling onError(Throwable) down stream and cancel() upstream.

This is sufficient for shutting down the stream, but my question is why is the exception passed down not up? Essentially the stream can be shutdown without passing the exception in either direction and the throwable is not needed:

Subscriber{
  ...
  void cancel();
}

Subscription{
  ...
  void onError();
}

So given that as a conceptual starting point, can the usability of the API be improved by telling one end or the other the details of a problem?

I'm struggling to think of an exception that the ultimate subscriber can use to improve it's implementation. All it really needs to know is that the stream is not completing normally.

On the other hand, I can think of several examples where passing the exception upstream to the origin publisher will help the usability of the API. If a publisher is delivering an unsuitable object (it is incomplete, illegal state, can't be serialized, contains types that have no json convertor, don't match DB schema etc.), then telling that publisher what is the problem will be valuable in helping them correct their Publisher. Sure the exception can be logged by either the Processor or the ultimate Subscriber, but that might be on a remote system to the Publisher and they may not see the log.

So if a Publisher is sending an unsuitable item, all they will see is a cancel call in return and no explanation why.

So if the exception is to be propagated, sending it upstream seams to me more appropriate than sending it downstream. But not opposed to sending it both ways:

Subscriber{
  ...
  void onError(Thowable);
}

Subscription{
  ...
  void onError(Throwable);
}

With the spec having suitable protections about looping errors.

gregw commented 9 years ago

As an example of this issue, a very simple error is that a Publisher can break the spec and call onNext when it has not got a request. Now my instinct for that is to throw an IllegalStateException to tell the caller that it is acting badly (or to pass that back in an onError call). But the spec only allows me to throw NPE, thus all I can do is to cancel the publisher and perhaps tell my subscriber onError(new IllegalStateException("unrequested item"));

viktorklang commented 9 years ago

@viktorklang Bringing this discussion back here from http://cs.oswego.edu/pipermail/concurrency-interest/2015-June/014275.html

Thanks! I really appreciate it.

I understand your comments about the expense of bidirectional communication and general acknowledgements. However, I also think that leaving the establishment of a back flow for such things as recycling outside the standard is leaving a bit of a hole for interoperability. But let me raise that separately.

Do note that RS does not assume that the Publisher-logic and Subscriber-logic live inside the same VM, and as such recycling goes out of scope. If you have more specific knowledge you can most definitely leverage that as long as RS contracts are followed.

For now, I want to understand more about the error handling, specifically in a chain of Processors. For any particular processor, it is possible that something can go wrong and when it does both ends of the RS need to be informed. Currently the API allows for errors to be propagated by calling onError(Throwable) down stream and cancel() upstream.

This is absolutely correct.

This is sufficient for shutting down the stream, but my question is why is the exception passed down not up? Essentially the stream can be shutdown without passing the exception in either direction and the throwable is not needed:

Why would it travel upstream / What would the value/use-case be?

So given that as a conceptual starting point, can the usability of the API be improved by telling one end or the other the details of a problem?

Also keep in mind that RS is not an API in the end-user sense, it is an interoperation protocol which lives more on an SPI level.

I'm struggling to think of an exception that the ultimate subscriber can use to improve it's implementation. All it really needs to know is that the stream is not completing normally.

Imagine a Processor that can, upon failure of its source, still provide cached, stale or other data downstream. See Processor spec rule 2: https://github.com/reactive-streams/reactive-streams-jvm#4processor-code

On the other hand, I can think of several examples where passing the exception upstream to the origin publisher will help the usability of the API. If a publisher is delivering an unsuitable object (it is incomplete, illegal state, can't be serialized, contains types that have no json convertor, don't match DB schema etc.), then telling that publisher what is the problem will be valuable in helping them correct their Publisher.

How would this help the user of a prefabricated Publisher? The exception would travel to the Publisher but the user of the Publisher could not fix it, since it is not their code. I find it helpful to keep thinking of RS as a protocol.

Sure the exception can be logged by either the Processor or the ultimate Subscriber, but that might be on a remote system to the Publisher and they may not see the log.

Since it is the Subscriber that gets the onError, it is up to it how it wants to deal with the situation, logging it definitely makes sense for a lot of implementations.

So if a Publisher is sending an unsuitable item, all they will see is a cancel call in return and no explanation why.

I'd recommend encoding as much of that validation into the element type as possible, the Publisher shouldn't be able to construct illegal instances in the first place.

So if the exception is to be propagated, sending it upstream seams to me more appropriate than sending it downstream.

From an RS perspective, what should the Publisher do in that case?

With the spec having suitable protections about looping errors.

Having failures only propagate downstream prevents looping. Personally I am not 100% sold on Exceptions in general, but error codes is a bit of a blunt instrument and sending strings only creates proliferation of stringly typed code.

As an example of this issue, a very simple error is that a Publisher can break the spec and call onNext when it has not got a request.

Breaking the spec is something that should be caught during tests, which is why rigorious testing of RS implementations is possible via things like the TCK.

Now my instinct for that is to throw an IllegalStateException to tell the caller that it is acting badly (or to pass that back in an onError call). But the spec only allows me to throw NPE, thus all I can do is to cancel the publisher and perhaps tell my subscriber onError(new IllegalStateException("unrequested item"));

The TCK suite for Publisher verifies that it doesn't send things when it isn't allowed. :)

gregw commented 9 years ago

@viktorklang

Well yes, the origin publisher should not publish inconsistent items or ones that cannot be processed... but in reality they do. If publishing bad items could be avoided at source then an exception/error mechanism would not be needed by RS at all!

Ditto that a Processors should be able to assume that it's peers will not break the spec when calling the API, but that too will happen. Should an implementation not check for compliance (and then fail in unknown ways); or check for compliance but be itself constrained by the spec when reporting errors; or check for compliance and not worry about compliant error reporting when it discovers an illegal API call.

So given that a problems can and will occur, it is just a question of who you tell and what do you tell them.

RS as is does allow both ends of the stream to be told of a problem, with cancel() propagating up and onError(Throwable) propagating down. So with regards of who do you tell, there is a tick as both ends can be told that there is a problem.

However, only the subscriber can be told any details of the problem (the exception). I believe that a large class of problems will be functions of the items passed in via onNext(item), so I believe that it is valid to tell the publisher the details of what was wrong with what they published. Otherwise a publisher will just see a cancel every time they publish a bad item. Any logging by the processor may well be on a remote system, so may not be seen.

Note that I can also imagine situations where the subscriber is the one that needs to be told, as there is probably a set of errors that are a function of how the publisher and/or stream was created/configured in the first place and that can be done by the ultimate subscriber.

So I think it makes more sense to allow the exception to be propagated in both directions.

cheers

viktorklang commented 9 years ago

@gregw

@viktorklang

Well yes, the origin publisher should not publish inconsistent items or ones that cannot be processed... but in reality they do. If publishing bad items could be avoided at source then an exception/error mechanism would not be needed by RS at all!

If only that was the case! The RS spec does not deal with -errors- but it deals with -failures-, such that a failure means that there is no path forward, think Publisher goes broken.

The recommendation is that if the Publisher wants to signal to the Subscriber that processing of a specific element failed (think Processor), it should represent that in its element type (think Either<T, Throwable>) such that only cases that needs it will have to pay for it.

Ditto that a Processors should be able to assume that it's peers will not break the spec when calling the API, but that too will happen. Should an implementation not check for compliance (and then fail in unknown ways); or check for compliance but be itself constrained by the spec when reporting errors; or check for compliance and not worry about compliant error reporting when it discovers an illegal API call.

If an implementation does not check for compliance, then by definition it is not an implementation :)

So given that a problems can and will occur, it is just a question of who you tell and what do you tell them.

For any and all situation where such feedback is mandated, or you want to build more sophisticated protocols (bidirectional) the solution is to have (Publisher, Subscriber) and (Subscriber, Publisher) on the respective ends. (two unidirectional flows form a bidirectional flow).

RS is kept extremely minimal to avoid feature bloat and make sure that more complex patterns can be layered on top.

RS as is does allow both ends of the stream to be told of a problem, with cancel() propagating up and onError(Throwable) propagating down. So with regards of who do you tell, there is a tick as both ends can be told that there is a problem.

While it is probably infrequent, cancel() is not indicative of a problem. It just says that the Subscriber does not wish to receive more elements.

However, only the subscriber can be told any details of the problem (the exception). I believe that a large class of problems will be functions of the items passed in via onNext(item), so I believe that it is valid to tell the publisher the details of what was wrong with what they published. Otherwise a publisher will just see a cancel every time they publish a bad item. Any logging by the processor may well be on a remote system, so may not be seen.

There's a big assumption here: That the truths of the Subscriber is the truth of the Publisher. If a Subscriber for some reason does not like the data they get, they are of course free to pursue that data from somewhere else. If the Publisher could know that its data is faulty, it should not have sent it in the first place (lack of tests) and if it couldn't, then it wouldn't have been able to do anything differently?

For these scenarios it is absolutely possible to have a bidirectional flow where the Subscriber can Ack/Nack elements. This is possible to construct using two unidirectional Publisher/Subscriber pairs.

Note that I can also imagine situations where the subscriber is the one that needs to be told, as there is probably a set of errors that are a function of how the publisher and/or stream was created/configured in the first place and that can be done by the ultimate subscriber. So I think it makes more sense to allow the exception to be propagated in both directions.

I hope that I have shown that if a bidirectional exchange is desired, it can easily be supported in the model by using Publisher-Subscriber pairs.

Thanks for the great feedback and discussion, looking forward to your reply!

gregw commented 9 years ago

@viktorklang,

so I'm definitely talking about failures here and not errors. I get it you're saying that errors should be handled in some form of ack/nack back flow.... so let's talk about that another time once I've got failure handling nailed down.

The first failure type I'm concerned with is spec violations. You say that an implementation much check for compliance and I totally agree. My question is what should I do once non compliance is detected. I would very much like to have code like this:

@Override void onNext(Object item)
{
    if (requests.decrementAndGet()<0)
        throw new IllegalStateException("unrequested call to onNext");
    ...
}

But that is against the spec, as I can't throw. So how do I inform the possibly remote publisher that it has screwed up? The only spec compliant mechanism I have is to cancel the Subscription, which is hardly a good way of signalling such a fundamental failure! I think it would be fine for the spec to allow an implementation to throw an ISE if it detects non spec compliant behaviour.

On the other type of failure, I acknowledge your observation about the truths of the subscriber vs the truths of the publisher. It is true that they are not the same and that which view point you take will definitely influence how failures should be handled. So let's consider another example....

Imagine an asynchronous webapplication that is using RS based database and IO. For a given request from the client, it needs to perform a database query that may return many elements, each of which will have some business logic applied, perhaps another layer of filtering, then converted to JSON, then encoded as bytes, then compressed and finally sent as a HTTP2 data frame. Let's implement each of these steps as an asynchronous RS processor so from origin publisher to ultimate subscriber we would setup something like:

DBQueryPublisher->MyBusinessLogic->BusinessFilter->JSONConvertor->StringEncoder->Compressor->HTTP2Stream

Now let's say the objects passing through this RS are polymorphic up until the JSONConvertor stage, when it uses some extensible JSONConverter to turn the object into string(s).

If a particular object comes through the RS that does not have a JSON Conversion available for that type, then I see that as a Failure. It's not a normal error, as it is something the developer forgot to provide or didn't imagine could happen. The app is broken.

So who can we tell about it? Currently the API would allow a onError(new NoSuchJSONConvertor()) to be sent downstream. But what has that got to do with the StringEncoder, Compressor or even the HTTP2Stream? They are all just going to treat it as an arbitrary failure and ignore the passed exception.

But I also see that there is little that the DBQueryPublisher could do with a noSuchJSONConvertor exception. As you say, it's truth is that the item it published was a valid response to the query it was given. All it cares about is if the RS is cancelled or not.

Yet I don't see that as generally the case, as in this example it is the truth of the MyBusinessLogic and/or BusinessFilter processors that it has assumed that all the objects it receives from is query subscription are fit to be published on its downstream subscribers. This is a wrong assumption and constitutes a failure... but a failure they will not be informed about as all they see is a cancel() coming upstream, which is indistinguishable from the very common case of the client browser just suspended or went out of wifi range.

I think it is very important for the BusinessLogic/BusinessFilter to be able to distinguish between the case of the network connection closed (common occurrence and is not a failure and is well captured by cancel semantics) vs a failure caused by an incorrect assumption that the objects they are publishing are suitable for the RS they are publishing to.

Thus in this case, I see that the NoSuchJSONConvertor failure should be able to propagate upstream to the BusinessLogic/Business/Filter at least. It is then a business decision if that failure is fatal (and they cancel the RS) or if they log and ignore it and continue publishing the other elements they receive.

I do not think this is a case where setting up a backchannel makes sense. The failure occurs because the developer didn't think there would be any objects that could not be converted, so they wouldn't think to provide a back channel for failures they don't think could occur.

So I now think that there is onError(Throwable) and cancel() semantics needed both upstream and downstream.

cheers

viktorklang commented 9 years ago

If onError and cancel is needed both upstream and downstream then that sounds like a perfect use case for having 2 Publisher/Subscriber pairs (bidirectional flows), don't you agree?

gregw commented 9 years ago

@viktorklang

I don't think this is a use-case for a reverse flow. A reverse flow might be justified to carry messages that are normal for a functioning application, but I don't think it is justified just to carry failures.

Consider the first example above, if implementing a check for spec compliances needs a pair of RSs then every RS will have to be paired if it is to be spec compliant!

Likewise, If you think my second example requires a paired channel, then I'd say that every RS that calls cancel is semantically equivalent and there should not be a cancel method only paired channels. I don't think that is workable.

I think failure termination notification needs to flow both up and down stream. An exception flowing with that failure termination notification may or may not be useful depending on the exact circumstance. So it can either always be passed and ignored when not needed, or you can have different APIs for passing it when you think it is appropriate for the other party to know the reason for the failure.

viktorklang commented 9 years ago

@gregw

I don't think this is a use-case for a reverse flow. A reverse flow might be justified to carry messages that are normal for a functioning application, but I don't think it is justified just to carry failures.

I'd say: Why not?—having a Publisher that only transports failures, and Subscriber that always cancels are fundamental constructs.

Consider the first example above, if implementing a check for spec compliances needs a pair of RSs then every RS will have to be paired if it is to be spec compliant!

I'm not sure I follow, could you elaborate?

Likewise, If you think my second example requires a paired channel, then I'd say that every RS that calls cancel is semantically equivalent and there should not be a cancel method only paired channels. I don't think that is workable.

Could you explain how they are all semantically equivalent?

I think failure termination notification needs to flow both up and down stream. An exception flowing with that failure termination notification may or may not be useful depending on the exact circumstance. So it can either always be passed and ignored when not needed, or you can have different APIs for passing it when you think it is appropriate for the other party to know the reason for the failure.

I'm sorry, I just don't seem to "get it". We have 6+ RS implementations that are known to work and interoperate, and what you ask for isn't a requirement for any of them AFAICT. Is everyone else wrong/oblivious?

gregw commented 9 years ago

@viktorklang,

Let me respond out of order order:

On 27 June 2015 at 06:09, Viktor Klang (√) notifications@github.com wrote:

I'm sorry, I just don't seem to "get it". We have 6+ RS implementations that are known to work and interoperate, and what you ask for isn't a requirement for any of them AFAICT. Is everyone else wrong/oblivious? In all probability it is me that doesn't get it and you've already corrected many misconceptions I've had when coming to this API. I appreciate you taking the time to try to understand my concerns and explain. But it may also be that my use-cases/concerns are slightly different to what you've already encountered. My own project (jetty) has been going 20 years and we still get surprised by new and different usages!

Also, this is error handling we are talking about. It is possible that you have projects that are working well together, but that don't fail nicely.

Consider the first example above, if implementing a check for spec compliances needs a pair of RSs then every RS will have to be paired if it is to be spec compliant! I'm not sure I follow, could you elaborate?

This is the usecase where a subscriber/processor receives a call to onNext when it has not requested one. Obviously it is the publisher that is at fault and is not acting correctly. My question is what should the subscriber/processor do:

throw an exception? Not allowed by the spec
cancel the subscription? Well that ends the conversation but does not tell the possibly remote publisher that it is acting incorrectly.
use a reverse RS created specifically for this purpose? If so then this is needed for every RS!

Likewise, If you think my second example requires a paired channel, then I'd say that every RS that calls cancel is semantically equivalent and there should not be a cancel method only paired channels. I don't think that is workable. Could you explain how they are all semantically equivalent?

Hmmm I don't think I can understand my own point there :)

Basically I think the flow of RSs I've described in my example is entirely reasonable:

DBQueryPublisher->MyBusinessLogic->BusinessFilter->JSONConvertor->StringEncoder->Compressor->HTTP2Stream But I don't understand how JSONConvertor should behave if it is given a non convertable object? Again the problem is almost certainly in either the business filter or business logic, but there is no standard way to flow the exception upstream. There is a way to flow the exception downstream, but the StringEncoder, Compressor and HTTp2Stream don't care about it.

A JSON Conversion error is not an excepted error. It is probably a programming bug or misconfiguration in the BusinessLogic. So why would a developer set up a reverse RS stream to carry unexpected errors? They think they are perfect so they wont do that.

I'd say: Why not?—having a Publisher that only transports failures, and Subscriber that always cancels are fundamental constructs.

In an environment that expects errors, yes a transport for those failures makes sense. But the two examples I've given are not expected failures. Subscribers don't expect non compliant publisher and programmers never expect that they might send the wrong object. But stuff like that does happen and there appears no standard way to propagate such exceptions towards the likely source of them. Log and cancel does not feel satisfactory to me.

cheers

Greg Wilkins <gregw@webtide.com gregw@intalio.com> - an Intalio.com subsidiary http://eclipse.org/jetty HTTP, SPDY, Websocket server and client that scales http://www.webtide.com advice and support for jetty and cometd.

gregw commented 9 years ago

To progress this issue, can I ask for some clarification on the first use-case I listed. Namely the spec violation, so that if onNext(Object item) is called on a Processor when no requests have been signalled, then what should the implementation do?

Throw an illegalStateException? The spec says it cannot throw?
cancel the subscription? That is a pretty silent way to fail!
call onError downstream? It allows the exception to be passed, but it is going the wrong way!
log and ignore? That allows broken implementations to work!

viktorklang commented 9 years ago

@gregw Apologies for the late reply here, I'm going to have time to answer you in 24h+ (preparing for a conference presentation tomorrow)

gregw commented 9 years ago

@viktorklang no great hurry, but would be good to progress!

viktorklang commented 9 years ago

Hi @gregw,

thanks for your patience!

Also, this is error handling we are talking about. It is possible that you have projects that are working well together, but that don't fail nicely.

I assume the operative word is "nicely", so the question is most probably one of expectation—what does one expect, and how (and by what) was that expectation formed.

compliances needs a pair of RSs then every RS will have to be paired if it is to be spec compliant!

I don't understand this implication, would you mind elaborating? My rationale is that not all channels are bidirectional, and RS is unidirectional in its data flow, to make it bidirectional, you use a pair.

This is the usecase where a subscriber/processor receives a call to onNext when it has not requested one. Obviously it is the publisher that is at fault and is not acting correctly. My question is what should the subscriber/processor do:

throw an exception? Not allowed by the spec

This is not meaningful as there is a huge difference between sending a signal and receiving a signal, and the only place where this situation could be meaningfully detected is on the reception of the signal, in which case there is nothing to throw the exception to.

cancel the subscription? Well that ends the conversation but does not tell the possibly remote publisher that it is acting incorrectly.

Absolutely right. It is not the job of the Subscriber to tell the Publisher that either, Publishers must pass the TCK and follow the spec. There are 2 distinct "levels" of operation at play here: The protocol level (RS) and the usage level (when RS compliant components interact).

What I'd recommend is to log and cancel. And to use unit tests and the TCK to test RS implementations.

use a reverse RS created specifically for this purpose? If so then this is needed for every RS!

No, that would not be practical nor feasible.

DBQueryPublisher->MyBusinessLogic->BusinessFilter->JSONConvertor->StringEncoder->Compressor->HTTP2Stream But I don't understand how JSONConvertor should behave if it is given a non convertable object? Again the problem is almost certainly in either the business filter or business logic, but there is no standard way to flow the exception upstream. There is a way to flow the exception downstream, but the StringEncoder, Compressor and HTTp2Stream don't care about it.

Sounds like you ought to create more precise types that cannot be created if they are invalid, such that the error is thrown where the error starts (instead of having to backtrack).

A JSON Conversion error is not an excepted error. It is probably a programming bug or misconfiguration in the BusinessLogic. So why would a developer set up a reverse RS stream to carry unexpected errors? They think they are perfect so they wont do that.

That reinforces my thinking that you need more robust types in your flow, or have a fallback strategy if not possible.

In an environment that expects errors, yes a transport for those failures makes sense. But the two examples I've given are not expected failures. Subscribers don't expect non compliant publisher and programmers never expect that they might send the wrong object. But stuff like that does happen and there appears no standard way to propagate such exceptions towards the likely source of them. Log and cancel does not feel satisfactory to me.

Both of these kinds of errors should be caught as early as possible, deferring that error handling until runtime is not constructive as it requires a redeploy to fix, I'd encourage the main ways of dealing with those situations, that has worked really well for me in the past:

1) Use types to enforce constraints on data 2) Use unit/generative/TCK tests 3) Anything that isn't caught by that: fail fast (cancel) and log.

Does that make sense?

gregw commented 9 years ago

@viktorklang https://github.com/viktorklang

further comment inline:

On 17 July 2015 at 00:31, Viktor Klang (√) notifications@github.com wrote:

I don't understand this implication, would you mind elaborating?

Don't worry about that bit, it was discussing an argument that your reply below renders irrelevant :

Absolutely right. It is not the job of the Subscriber to tell the Publisher that either, Publishers must pass the TCK and follow the spec. There are 2 distinct "levels" of operation at play here: The protocol level (RS) and the usage level (when RS compliant components interact).

What I'd recommend is to log and cancel. And to use unit tests and the TCK to test RS implementations.

OK, so log&cancel is the recommended (and only spec compliant) behaviour for passing unexpected error conditions upstream. I have a few concerns with this:

A subscriber may be on a remote machine with a log that is not immediately accessible by a publisher. All the publisher will see is a cancel and will have no idea why!
An aggregate applications may incorporate components from many 3rd parties. If Logging is required for RS components, then each provider may choose a different logging mechanism. Integration of RS components might be simple, but integration of diverse logging systems can often be painful and infrequent or unexpected logs may be lost.
It is asymmetric with onError down stream? Why should a Publisher tell a subscriber anything about it's internal failures? If log&cancel is an OK policy for upstream, then why is it not a valid policy for downstream error handling. All the subscriber needs to know is that it was an unsuccessful termination and the passed exception is superfluous.

Sounds like you ought to create more precise types that cannot be created if they are invalid, such that the error is thrown where the error starts (instead of having to backtrack).

Indeed! I'd also like world peace and an end to poverty :) My experience in implementing a container for other peoples applications is that there is no end to the ways that infrastructure can be misused and that boundaries will always be pushed. More over, the reason I'm flippantly discounting the "more robust types" argument is that it is beyond my control.

I am the provider of an application container, which I'd like to enable RS processing of content. I will be providing utility Processors for things like JSON, Gzip and linking back pressure to the HTTP2 flow control mechanism. But the development of the application and the design of the types passed is beyond my scope to be able to control.

No matter how good an implementation I provide, my users can and will surprise me with the most incredible ways they attempt to misuse the infrastructure, plus a good deal of accidental misconfiguration and misconception.

What I'm looking for is how as a 3rd party provider, can I best deal with these situations that I know will happen. If log&cancel is the standard mechanism that most users know and accept, then that is perhaps workable... but I can see log-wars developing with regards to which log mechanism should be used by 3rd party components that maybe assembled into a single application.

Both of these kinds of errors should be caught as early as possible,

deferring that error handling until runtime is not constructive as it requires a redeploy to fix, I'd encourage the main ways of dealing with those situations, that has worked really well for me in the past:

1) Use types to enforce constraints on data 2) Use unit/generative/TCK tests 3) Anything that isn't caught by that: fail fast (cancel) and log.

Does that make sense?

1 & 2 do make sense, but they are matters for application developers and beyond the control of an application container developer.

I find 3) makes no sense for an application container or 3rd party reusable component.

It is difficult for 3rd party components to use and/or enforce logging criteria on the applications into which they are integrated. It is far better approach to provide callbacks that can report the reason for a failure and let the application decide how to log it and if it is fatal or not.
It is inconsistent with downstream failure handling as onError does allow a reason to be passed. It should be consistent - either you signal the reason for failure or you don't. Having a mixed policy is confusing.

So I think I'm understanding your intentions better, but find them unsatisfactory for my use case. I'd really like to advocate that RS's be supported in the Servlet 4.0 standard, but I don't see how I can do that without reasonable error handling that is portable across different containers and applications that are container agnostic.

cheers

gregw commented 9 years ago

There has been some discussion of this on http://cs.oswego.edu/pipermail/concurrency-interest/2015-July/014302.html

Note the comment by Doug Lee about the SubmissionPublisher implementation not correctly handling exceptions thrown from onNext(). I think this is a result from a contradiction in the spec (you are not allows to throw from onNext.... but if somebody does throw, then this is how to handle it), which together with no other outlet for exceptions in an onNext implementation means that onNext probably will frequently throw unchecked exceptions and callers will need to deal with it - this is unfortunate and not what is intended for RS.

Surely it would be much better to actually provide an outlet for exceptions so that specification 2.13 should read something like:

for all other situations the only legal way for a Subscriber to signal failure is calling Subscription.cancel(Throwable reason)

Without that, it is unlikely that implementations of onNext will catch unchecked exceptions as there is nothing they can do with them and the spec 2.13 does say what a publisher should do if they are let to be thrown from onNext.

DougLea commented 9 years ago

@gregw I wasn't saying that SubmissionPublisher is incorrect; just that it might be more helpful to developers creating publishers if it allowed them to install a handler to do something with onNext (etc) exceptions before/while cancelling the subscription. It also occurs to me that any such handler could probably do any of the things that otherwise motivate your onCancel(Throwable) proposal?

gregw commented 9 years ago

@DougLea Ah on re-reading the code I can see that onError doesn't always send the exception downstream. However, given the spec description of what to do, calling cancel() still looks to perhaps a better option for a caught exception (but I need to get my head more into the state machine to comment anymore).

Either way, I think there is a need for a onSubscriperError as this case shows that a publisher will be able to receive and will have to handle failure notifications from subscribers - be they "illegally" thrown from a call to onNext or preferably passed back by the method I'm advocating: cancel(Throwable th).

I think semantically, the two paths are equivalent - an exceptional failure has occurred within the Subscriber. This may be because of an error within the Subscriber, or it may be that the Publisher has itself published something in error - so both parties need to be informed of the exception and both can log/ignore/handle as appropriate. Either way it is a terminal situation for the subscription.

gregw commented 9 years ago

@viktorklang In the concurrent-interest thread you said: "Note that since it is a spec for async, the invocation of onNext (called a signal in the spec) is divorced from the processing of said signal."

Isn't that an argument for having a proper callback channel for subscription failures? As we can see from the implementation of SubmissionPublisher in concurrency that they have to explicitly handle the case of exceptions thrown from calls to onNext - even though that is "Illegal" in the specification.

However, as you say, some invocations of onNext will not be direct and thus thrown exceptions will not be caught/handled in the publisher. This is just going to result in indeterminate and unpredictable failure modes.

Surely it is better to have a cancel(Throwable reason) method to allow any Subscriber to pass back a failure reason. The spec can then say that any exception illegally thrown from calls to onNext must be handled as if cancel(Throwable) had been called.

davidmoten commented 9 years ago

@gregw I'm not active in this space but have been following with half an eye...

Given that by the time an error is thrown in the Subscriber the Publisher may have completed its emissions and even might normally be ready to release its resources (for GC say), do you expect

that a Publisher should wait with some configurable timeout for cancel messages (even after completing its emissions) so that the cancel(Throwable) event should be processed in the Publisher with relevant context?
that if the Publisher exceeds said timeout (if exists) that a handler should be provided (for instance a singleton logging handler)?

What about correlating the error in the Subscriber with the onNext published by the Publisher? Would you formalize that?

gregw commented 9 years ago

@davidmoten

I don't propose that the publisher act any differently than it currently does. When a publisher terminates a stream either normally with onComplete() or abnormally with onError(Throwable), there is already a race with an incoming cancel() call, which the publisher has to handle elegantly. Ie it is already possible for a cancel() signal to arrive just after a publisher sends onComplete() signal.

So this is no different with a cancel(Throwable reason).

I don't propose any correlating of cancel(Throwable reason) with any item delivered in a call to onNext(). It may be that there is no associated item, as a subscriber might give up on a slow publisher with subscription.cancel(new TimeoutException()); Which is another good example of how a slow publisher may be at fault for a cancellation, so having a reason is a good way to communicate that blame.

davidmoten commented 9 years ago

@gregw the reason I ask about giving context to a cancel(Throwable) in the Publisher after termination is that stream processing for me nearly always looks like this

Publisher -> Processor -> ... -> Processor -> Subscriber

and having the cancel(Throwable) potentially not propagate upstream further than the last Processor seems to weaken the case for having it.

gregw commented 9 years ago

@davidmoten

So in a situation like:

    Publisher -> ProcessorA -> ProcessorB -> Subscriber

You are concern with cases like ProcessorA receiving a onCompleted() signal at the same time as ProcessorB has a failure and wants to call cancel(Throwable reason) on ProcessorA, which you want to flow up to the Publisher. However, because ProcessorA has already completed, it does not forward on the cancel(Throwable reason).

Well I can see the problem, but I don't think it is any different with our without a reason argument to cancel. Either way, the Publisher is able to complete unaware of problems downstream. It currently would be ignorant of a cancel(), just as it would be ignorant of a cancel(Throwable).

So I see your concern, but I think it is somewhat orthogonal to my proposal.

However, I would note that the spec does allow for cancel to be called after completion and it is considered a noop, so perhaps ProcessorA should always pass up the cancel signal, even if it thinks it is completed. The upstream processors and publishers must noop it, but they can log it. Thus the reason can be passed back.

DougLea commented 9 years ago

To try to summarize: If Subscribers throw exceptions in onNext, they are non-compliant, and the only thing they can expect is to be cancelled (not even that is guaranteed). However, the spec also allows Publishers to process such exceptions any way they like (which means that we should/will add a hook for this in j.u,c.SubmissionPublisher), in case they happen to know anything useful to do with them. Transitively, a set of Processors might know what to do with them, using the same mechanics.

In other words, the spec currently allows that a series of stages might be coupled enough to coordinate error handling, but doesn't mandate it. So in general if a Subscriber throws in onNext, practically nothing is guaranteed. But particular frameworks might make and document alternatives.

If every buys this, I think the spec issue can be closed?

abersnaze commented 9 years ago

Maybe I'm misunderstanding what the problem is but I get the sense that in the Subscriber isn't owned by the server. This is an overview of the way it was implemented in RxNetty where the server is the producer and the consumer and the handler is just provides Processors in the middle.

The server subscriber is only interested in the onComplete or onError to know when all the writes (if any) are done.

gregw commented 9 years ago

@DougLea I think the aspect of this issue regarding exceptions thrown from onNext is a bit of a red herring. So while I buy what you say about throwing, I do not think that closes this issue.

The substance of this issue is that failures can happen inside a subscriber that are the fault of the publisher, for example:

The publisher itself can break the spec and call onNext when it should not.
Objects sent by the publisher may not meet some assumptions the subscriber has (eg Object was meant to be serializable but wasn't, Database is corrupted and sends Strings with illegal unicode strings)
Publisher is slow and Subscriber suffers a TimeoutException

So the question is, how can the Subscriber communicate such failures to the Publisher? Throwing is not a valid option, as it is not legal and even if thrown, there is no guarantee that the publisher is in the calling stack.

The current response to this question is that the subscriber should just log and cancel(). However, the problem that I'm raising is that logging is not a sufficient way to communicate failure, as onNext signals may cross physical/jurisdictional boundaries. There is no guarantee of a common logging mechanism nor access to the logs if needed. Cancel() without reason will be very hard to diagnose.

A further response has been to say that subscriber failures are of no interest to a publisher. However, I believe the examples shown in this thread do indicate that a publisher may at least be suspected of being at fault of a failure in a subscriber.

Thus I have proposed that the cancel() method be changed to cancel(Throwable) so that a reason for failure can be passed upstream, just as it is passed downstream with onError(Throwable)

Publishers will then be notified of the reason for a cancel(). They can then log, handle and/or ignore as they wish. The benefit of this is that the failure signal crosses back over any physical/jurisdictional boundaries back to the context that the signal to onNext() was initiated. Ths even if the reason is only logged and ignored, it will at least be more readily available for diagnosis of problems.

viktorklang commented 9 years ago

@gregw

I do apologise for the delayed response, it's been one of those weeks. :/ Also, I wanted to make sure I had really thought hard about your proposal (because I take it seriously) before I had reached a personal conclusion on the matter.

cancel(Throwable) may have a couple of use-cases, however as a general mechanism it just doesn't hold water, to me, for the following reasons:

A) It is never the fault of the Publisher if the Subscriber fails. The contract is in the types and in the RS Spec: weak types means that the Subscriber needs to be ever more tolerant. B) Propagating an Exception upwards in the flow may make sense for some situations, but far from most (how would you forward it in a fan-in situation?) C) It is always racy and will get lost along the path upwards (as discussed previously). D) Exception types are also dependencies in reverse, anything thrown by an unknown consumer must also be present at the very source. E) A Publisher cannot fix a faulty Subscriber, but a Subscriber can choose to use another Publisher F) It doesn't work like that for anything(?) else: Do you send exceptions that happen on the client back to your database? (Such errors are almost universally, in my experience, sent back through another channel, designed to deal with crash reports)

TL;DR: In my opinion, after long consideration, the proposed change does not solve a functional problem, and not generally any diagnostics problem. In order to transport failures to arbitrary stages, a bidirectional pair can be used, for when needed.

Alternatively, if one happens to have an isolated (as in Pub-Sub pair control) use-case where this passing of exceptions is highly desirable, one can make one's Publisher use an "ExceptionCancellableSubscription" with a cancel(Throwable) method, and your Subscribers can check their subscription if it is of that type, use cancel(Throwable), and if not, use plain cancel().

I'm afraid I don't have much else to add to the discussion right now, please don't hesitate to ask for clarification if anything needs it (I hope I was able to articulate what I had in mind).

Have a great weekend!

√

gregw commented 9 years ago

@viktorklang No need to worry about the delay in answering. I appreciate the repeated consideration.

I think we are close to stalemate on this one, as I don't have much new to say and I don't think I've convinced you.... but unfortunately I'm also not convinced the error handling is sufficient. So I'll give it one more spin before concluding we just look at the world through different abstractions!

On 1 August 2015 at 04:51, Viktor Klang (√) notifications@github.com wrote:

A) It is never the fault of the Publisher if the Subscriber fails.

I think this is the nub of our different points of view. I think that a publisher can definitely be at fault and believe I've given several examples. Consider a Producer from an async DB driver, it can be at fault for a failure if:

It never responds by calling onNext(), onCompleted() or onError(Throwable), perhaps due to internal failure, network delays, or just a diabolical query. A Subscriber may eventually suffer a timeout for it's own data sink and the fault is with the Producer if it is not providing the quality of service it is meant to.

It responds with a call to onNext() when their was no request for data.

It responds with a wrong response. The query might have been for "all red widgets" and a badly coded publisher might send a blue widget.

It responds with a corrupt/invalid objects.

The assumption that Producers are without fault in what they produce is just not realistic. Specially in applications that may be built of a Flow of many Processors, each coded by different groups with varying levels of competence an understanding of the application.

I don't believe that it is sufficient to say that the datatypes must be entirely self validating. Even a simple integer or string can have illegal/corrupt/wrong values when used to convey many semantics.

B) Propagating an Exception upwards in the flow may make sense for some

situations, but far from most (how would you forward it in a fan-in situation?)

Exactly has cancel() would now. I'm not proposing any new semantic of flow, I'm just proposing a reason to go along with an existing flow.

C) It is always racy and will get lost along the path upwards (as

discussed previously).

Indeed, but cancel() already is. The spec handles this well already by sailing that multiple cancels are to be handled as noops. I'm not proposing this change in any way.

D) Exception types are also dependencies in reverse, anything thrown by an

unknown consumer must also be present at the very source.

True. But onError(Throwable) has the same issue and that is not seen as a insurmountable problem.

At the very least, cancel(String reason) would be better than no reason!

E) A Publisher cannot fix a faulty Subscriber, but a Subscriber can choose to use another Publisher

I'm not worried about faulty Subscribers. I'm worried about how to code a good subscriber that is validating what the Publisher sends and notices that the publisher has done something wrong. So the good Subscriber has detected a fault in the Publisher and wishes to tell the Publisher so.

I started this when coding my own Subscriber and didn't know what to do if I detected an onNext(item) call when I had not sent a request. Log&cancel is just not sufficient as I have no idea what logging mechanism the publisher may be using/monitoring.

TL;DR: In my opinion, after long consideration, the proposed change does

not solve a functional problem, and not generally any diagnostics problem. In order to transport failures to arbitrary stages, a bidirectional pair can be used, for when needed.

The bidirectional pair does not work. The developers of the async DB producer are not intending to send corrupt/wrong/no results.

Alternatively, if one happens to have an isolated (as in Pub-Sub pair

control) use-case where this passing of exceptions is highly desirable, one can make one's Publisher use an "ExceptionCancellableSubscription" with a cancel(Throwable) method, and your Subscribers can check their subscription if it is of that type, use cancel(Throwable), and if not, use plain cancel().

Hmmm perhaps that is workable.

Currently I'd really like to proposed the RS semantic for inclusion in the Servlet 4.0 and there is interest in it being used to simply async. But with support for upstream error handling, this will just not be possible.

The container will instantiate the type of the subscriptions used to receive request content, and the spec could mandate the type of the subscriptions used to send response content...

But then if you control all the pub-sub pairs, then this is not really an issue as you are probably controlling the logging of them all.

So it could solves the problem over the container/application boundary. But if the application is using other remote RS (eg async DB), then errors will still not be able to be passed to that jurisdiction.

I'm afraid I don't have much else to add to the discussion, but please

don't hesitate to ask for clarification if anything needs it.

Well I'm all out of arguments as well and not much more I can say either. So unless my last ditch attempt has given you an epiphany... or can inspire you to say something that might give me one, then I guess we'll just have to agree to disagree.

It's a pity, as I really like the functional semantic, but I'm unlikely to heavily invest in a mechanism that I don't think has reasonable error handler nor advocate for it to be included in other specifications.

cheers

viktorklang commented 9 years ago

@gregw

First of all, I really appreciate conversing with you, thank you for being exemplary in conversing about a tricky topic.

Reading your reply, I think the main source of difference in opinion is due to a (very common in my experience, we should advertise this better) misunderstanding:

Reactive Streams is not an end-user / application developer API—just as TCP is not something every developer is expected to implement themselves, and correctly to boot!

Think of Reactive Streams as a protocol, and implementations thereof MUST use the TCK to verify their implementations both to get guidance on how to do it correctly, but also to make sure that there are no regressions introduced that would violate the contract imposed by the specification.

I hope this addresses:

The assumption that Producers are without fault in what they produce is just not realistic. Specially in applications that may be built of a Flow of many Processors, each coded by different groups with varying levels of competence an understanding of the application.

Responding to sections in order below:

I don't believe that it is sufficient to say that the datatypes must be entirely self validating. Even a simple integer or string can have illegal/corrupt/wrong values when used to convey many semantics.

If Integer is the element type, then any Integer should be valid to send, because a Subscriber would not know more than that in general—it would have to be able to deal with any Integer, because that is what it accepts. It's "standard practice" to create a wrapper type that validates its inputs such that a Publisher could not send a malformed instance, if a Subscriber wants to be more prescriptive.

Exactly has cancel() would now. I'm not proposing any new semantic of flow, I'm just proposing a reason to go along with an existing flow.

I'm sorry, but that doesn't make sense to me. Imagine that there are 5 Publishers fanned into a N-way merge, and connected to that merge is 3 processors and then a Subscriber at the end. If the last processor sends something that the Subscriber doesn't like, and it responds with a cancel(DoNotLikeThisElementException), what's the Processor going to do about the exception?

The bidirectional pair does not work. The developers of the async DB producer are not intending to send corrupt/wrong/no results.

I'm not worried about faulty Subscribers. I'm worried about how to code a good subscriber that is validating what the Publisher sends and notices that the publisher has done something wrong. So the good Subscriber has detected a fault in the Publisher and wishes to tell the Publisher so.

Let me stop here to ask, do you at this moment, when you get wrong data from your DB vendors driver, have any chance of sending an exception to it, and what would it do?

What I typically do in that case is that I open a support ticket/Issue/PR depending on the situation with the vendor for them to fix it. I don't see how passing an Exception to the driver would fix the problem? I am clearly missing something fundamental here, what am I not seeing?

True. But onError(Throwable) has the same issue and that is not seen as a insurmountable problem.

It doesn't have the same issue, Subscriber depends on Publisher and not in reverse.

At the very least, cancel(String reason) would be better than no reason!

Ok, so let's walk this through: So the Subscriber sends cancel with some String attached, what does the Publisher do with it?

Currently I'd really like to proposed the RS semantic for inclusion in the Servlet 4.0 and there is interest in it being used to simply async. But with support for upstream error handling, this will just not be possible.

Having it included in Servlet 4.0 sounds fantastic. How about discussing the problem(s) with a proposed solution, we instead talk about what the underlying problem is (what motivates the solution, how is it solved in previous versions)?

It's a pity, as I really like the functional semantic, but I'm unlikely to heavily invest in a mechanism that I don't think has reasonable error handler nor advocate for it to be included in other specifications.

Are you open to entertain the possibility that the current specification has reasonable error handling? 2 years, many organizations, engineers and implementations seems to indicate that it is, in fact, working—it is most likely not perfect (as things typically never are), and I appreciate having this opportunity to find out if there is a functional problem with the current spec, as if that is the case, it should most definitely be addressed. At this point, introducing a breaking chance, needs to be really worth it.

As I am certainly no "primus inter pares" amongst the @reactive-streams/contributors, I'll step aside for a while to let the others chime in.

abersnaze commented 9 years ago

One way to see if any change makes sense is to think about what the dual would be for an iterable or InputStreams.

Iterable	InputStream	Flow
iterator()	URL.openStream()	subscribe()
first call to hasNext()	first call to read()	start()
next()	read() returning a non EOF value	onNext()
-	read(byte[])	request(n)
hasNext() returning false	read() returning EOF	onCompleted()
hasNext() or next() throwing	read() throwing	onError(Throwable)
break	close()	cancel()

DougLea commented 9 years ago

@gregw A reminder that the initial (and still primary) resistance is that the current spec never requires data marshalling from Subscriber/Subscription to Publisher-- subscriptions only use signals (request/cancel); similarly any exceptions in Subscriber methods can be treated as signals since all they do is cause cancellation. Adding a method that sends an object (even a Throwable) could be traumatic for some implementations.

Without such a method, you do have the two choices we've mentioned: Set up an error stream in the other direction, and/or locally catch and handle the exceptions thrown in Subscriber methods. Together these seem to cover use cases.

Maybe someday people will want to extend RS streams to support a bidirectional mode, but it is hard to justify doing so solely for the sake of sending a Throwable.

gregw commented 9 years ago

@Viktor,

I'll respond out of order and only to a few key points....

Are you open to entertain the possibility that the current specification has reasonable error handling? 2 years, many organizations, engineers and implementations seems to indicate that it is, in fact, working—it is most likely not perfect (as things typically never are), and I appreciate having this opportunity to find out if there is a functional problem with the current spec, as if that is the case, it should most definitely be addressed. At this point, introducing a breaking chance, needs to be really worth it.

The most likely situation is that I'm misunderstanding RS.

However, there is a chance that even with 2 years of good working implementations, that the error handling can still be improved. Early work on new abstractions is often within one team focused on how to make it work. I bring a lot of experience of providing infrastructure that crosses multiple jurisdictions and that experience tells me that getting error handling right is very important.

Let me stop here to ask, do you at this moment, when you get wrong data from your DB vendors driver, have any chance of sending an exception to it, and what would it do?

True, currently if my DB driver stuffs up, then I cannot send an exception to it. However, as you say RS are not APIs but are protocols. If I'm talking to my DB via a RS then I'm in an ongoing relationship with it, and all the protocol work I've done suggests that good protocols have reasons in their error terminations. Most recently I have contributed to the development of HTTP2 protocol and it has reasons in its error responses.

I can't recall any protocol that I've worked with that does not support reasons in its failure termination messages.

Think of Reactive Streams as a protocol, and implementations thereof MUST use the TCK to verify their implementations both to get guidance on how to do it correctly, but also to make sure that there are no regressions introduced that would violate the contract imposed by the specification.

I have spent many years working with TCKs of software standards. They are very important tools, but they are a long way short of a formal proof of compliance.

I'm sorry, but that doesn't make sense to me. Imagine that there are 5 Publishers fanned into a N-way merge, and connected to that merge is 3 processors and then a Subscriber at the end. If the last processor sends something that the Subscriber doesn't like, and it responds with a cancel(DoNotLikeThisElementException), what's the Processor going to do about the exception?

I assume the same situation can occur with the current cancel() method. The Processor will have to propagate up that cancel call to the N-way merge, at which point all 5 subscriptions to the 5 publishers will need to be cancelled.

I'm only proposing that an error be passed along with that cancel. Processors may pass it along, wrap it with their own exception or replace it with their own reason.

Having it included in Servlet 4.0 sounds fantastic. How about discussing the problem(s) with a proposed solution, we instead talk about what the underlying problem is (what motivates the solution, how is it solved in previous versions)?

OK, here is a more concrete example. Let's say we add RSs to the servlet API with something like:

class ServletRequest { Publisher getInputReactiveStream(); }

So that input received from the HTTP client can be fed into a RS (and you can see the test Jetty repo to see how we can use Processors to turn streams of byte/chars into a stream of Form entries).

So a developer is writing an application that is being deployed on GoogleAppEngine or some other abstracted Servlet container. The development team is writing both the client javascript and the servlets processing the content sent.

A client request can fail in many different ways: network failure, protocol failure, timeout as client didn't send anything at all. If that byte stream is processed by the container for compression, characters, fields, objects there are more and more ways for there to be an error.

Currently if any of those failures occur, then all we can tell the application is cancel(), which tells them precisely zero about why the request failed. It could be their client, their browser, the network, the proxy, the reverse proxy, accounting limit on the server, size limit on the server, bug in the server. But all the developers will know is that their stream was cancelled. The reason might be logged, but the log may be on a remote machine/service, they may not know about it.

My experience is that without a reason, they are just going to blame the container and raise a bug, complain of the forum, ping me on IRC etc. etc. But if only the reactive stream API allowed the container to pass up new IOException("Illegal UTF-8 code point"); or new SocketException("HTTP2 GoAway code XXX"); then they would be much better able to help themselves.

Moreover, on the internet, exceptional terminations of connections is pretty normal, but a container is not well placed to tell what is and what is not important. It can't assume that IOExceptions are low priority and log them at debug level. So the log file is going to be filled up with lots of exceptions as people suspend their laptops or go out of wifi range etc.

logging is just not a good way to handle such failures.

cheers

gregw commented 9 years ago

@DougLea,

Imagine if the CompletionHandler API in java had the Throwable and attachment taken out of the failed() method signature, so all the developers are told is that they operation failed and that the container had logged the reason somewhere! That would not be contemplated, so I can't see how an alternate API can be put into java 9 that does not have the ability to give reasons for failures.

Also, implementations should be free to change/wrap/summarise the reason passed as needed. For example the cancel is crossing a real protocol boundary, then the reason given might just be cancel(new IOException("Network failure")) or cancel(new FlowException("remote cancel")) to at least tell the subscriber if it was the remote end or the network causing the problem.

Plus I can see the argument that exceptions crossing jurisdictional boundaries can become meaningless as the stack frames are private and the type of the exception may not exist. So a string reason or error code would be sufficient. Even if the error codes were just a simple breakdown of why the subscription was cancelled: timeout, protocol error, bad object, subscriber error, Data no longer needed etc.

I picked Throwable, because it is the reason that is used to pass downwards in onError calls, so I thought it a reasonable choice (as the problems of serialization and meaningfulness of the stack exist there as well).

To repeat my previous email, without any failure reason of some form to nudge developers in the right direction, they are just going to assume it is the containers fault and pester the container provider with needless support requests. The container provider will try to protect themselves by being overly verbose in their logs and the many normal internet failures which can currently be ignored will be logged in detail just in case they are important.

cheers

On 2 August 2015 at 23:25, DougLea notifications@github.com wrote:

@gregw https://github.com/gregw A reminder that the initial (and still primary) resistance is that the current spec never requires data marshalling from Subscriber/Subscription to Publisher-- subscriptions only use signals (request/cancel); similarly any exceptions in Subscriber methods can be treated as signals since all they do is cause cancellation. Adding a method that sends an object (even a Throwable) could be traumatic for some implementations.

Without such a method, you do have the two choices we've mentioned: Set up an error stream in the other direction, and/or locally catch and handle the exceptions thrown in Subscriber methods. Together these seem to cover use cases.

Maybe someday people will want to extend RS streams to support a bidirectional mode, but it is hard to justify doing so solely for the sake of sending a Throwable.

— Reply to this email directly or view it on GitHub https://github.com/reactive-streams/reactive-streams-jvm/issues/271#issuecomment-127024033 .

DougLea commented 9 years ago

@gregw Allowing cancel to send some kind of error code might have been a good idea if limited to Strings, or ietf/Unix-style numerical codes. This would require that Publishers understand such codes enough to do something with them, but I can picture arriving at a minimal set of standard ones. In the mean time, if your back is to the wall, you could use special negative sentinel values in subscription.request for these purposes with (surprisingly) the same (but much hackier) effect. I suppose that if there were agreement that this is the way to go, such values could be assigned now: request(-12345) means "data corruption". or whatever.

rkuhn commented 9 years ago

@gregw Thanks for providing two examples that helped me understand the difference in our reasoning: the ServletRequest class and the analogy to CompletableFuture. I’ll try to detail the difference so that we can hopefully converge on a common understanding.

First, the CompletableFuture: you compared the current nullary cancel() method to removing all arguments from the failed() method. In a chained computation described as transformations of CompletableFutures we model a strictly unidirectional flow of information from provider (the one that completes the Future) to consumer (all derived Futures or registered callbacks). The information that flows can either be a value of type T or a Throwable, disambiguated by representing them in two different completion modes—successful and failed. But it should be noted that this information only flows forward, the failure of a derived Future has no bearing on its parent or origin. Combining two Futures with anyOf means that the failure of one of them will taint the combination, but this does not in any way affect the other input—even though it is “cancelled” (i.e. its value will not matter any longer). The point I’m getting at is that RS is consistent with the CompletableFuture API in the way failure values are disseminated.

Second, and more difficult, but probably also more enlightening, is the ServletRequest example. If you model an RS-driven servlet in this fashion then you implicitly declare the presence or absence of cancellation to be the binary service call result. This means that requesting all information to be delivered to onNext() is seen equivalent to acknowledging reception when coupled with the absence of cancel(). Unfortunately this is not a reliable interpretation with the current RS specification since the Publisher could well call onComplete() after the final element has been dispatched and then be deaf to any subsequent failure—the reason is that the RS protocol deliberately does not include an acknowledgement signal. The request/cancel channel is only good enough to convey whether more elements are welcome to be sent, not more. This is the consequence of modeling a truly unidirectional flow of information; making the semantics any stronger would entail more coordination and synchronization, which means more overhead, less throughput, etc. It is quite some difference in effort to implement a back-pressured channel and a reliable acknowledged channel, RS consciously aims for the smallest useful functionality.

Within these constraints it is clear that the ServletRequest needs two flows of information: the user input may want to be incrementally acknowledged by a flow of ACK messages, or the request should get an overall response (possibly containing a result) that is generated based upon the received input. As an example of such an API Akka HTTP is modeled as the equivalent of a Processor<HttpRequest, HttpResponse>, where in both directions the payload for each message is expressed equivalent to a Publisher<ByteBuffer>. This processor handles all requests that are pipelined onto one HTTP connection. Another example is the Play Framework which models an HTTP server as a function from HttpRequest to the equivalent of a Processor<ByteBuffer, HttpResponse>. In both APIs the common theme is that the server implementation consumes the client input and produces the response output—in a streaming fashion. The ServletRequest could thus be described as

public interface ServletRequest {
  public Publisher<ByteBuffer> getInputStream();
  public Subscriber<ByteBuffer> getOutputStream();
}

and the application code would subscribe to the input and subscribe the output servlet output to its own output. Or the API goes the other way around:

public interface ServletRequest {
  public void completeWith(Processor<ByteBuffer, ByteBuffer> handler);
}

This is a matter of taste, we chose the equivalent of the latter because it leaves less things to be done (wrong) by the user.

Now, what does this have to do with error-bearing cancellation? Just as with CompletableFuture all information is supposed to flow forward and by closing the loop as described we have an obvious place of where to react to failures: when the servlet somehow fails and cannot handle the request it will fail its output stream and the right things will happen—that the reason behind not wanting to read more elements is abnormal does not need to be signaled to the client on the reading channel as well, because the only thing that TCP (or any bidirectional stream protocol) can do on the reading side is to not pull it, there are no error propagation mechanisms in this direction.

The only remaining question is then why onError() carries auxiliary information. I agree that in principle it could be done without the Throwable, but the scenario when onError() is used is that the whole stream setup itself has failed and needs to be torn down ASAP and no other data elements can be shipped downstream that could allow direly needed replacement data elements to be made up—we need to generate an HttpResponse even if none came out of the Processor. This problem is asymmetric to the upstream side where all that needs to be done is to stop sending more elements and possibly releasing resources. As I said, we could just generate a 500 response in all those cases, but that is a specific ability of HTTP that other stream consumers might not have (and onError(Throwable) allows us to log failures in the framework/library instead of having the user care about that). As a concrete manifestation of what I mean you can take a look at Akka’s Flow.recover() operator that turns a failing stream into one that emits one last element before completing normally—of course this is just stolen from RxJava’s onErrorReturn (we have a ticket for also implementing the equivalent of onErrorResumeNext).

I hope this sheds some more light on the differences; my impression is that you expected the Publisher–Subscriber pair to be the only tool you need and that Viktor’s and my responses constitute or reveal an expectation violation. In any case, I’m confident that together we’ll figure it out.

gregw commented 9 years ago

@Roland,

Thanks for your message. The pointers to other frameworks using RS for HTTP are interesting as I definitely need to look at more concrete examples if RS are be be included in servlet APIs in a meaningful way. My starting point is to consider providing something simple like:

public interface ServletRequest { public Publisher getInputStream(); public Subscriber getOutputStream(); }

But the more the container can do for the developer (compression, character conversion, parsing) then the simpler the application becomes and the asynchronous processing can developers who do such processing via blocking APIs etc.

Yet the quandary I find myself in is that the more processing I consider that the container does, then the more failures that can occur within the container. I find myself holding an exception, which is not-my-problem! but nowhere to pass on that exception other than a log!

Let me expand a little bit more on my prime use-case. It was not CompletableFuture that I was referring to in my example, but rather the async IO mechanisms of servlet API and JVMs. Let's look at the outbound path. I start with the servlet async IO API:

public interface WriteListener { public void onWritePossible(); public onError(Throwable t); }

The Jetty container takes the unidirectional flow of data that results from that and processes it somewhat. It may compress it, it will aggregate small messages into larger ones and eventually create headers and commit the response and then flush it. It will do this via HTTP/1 or HTTP/2. If HTTP/1 it may have to process it into chunks. If HTTP/2 the data stream is processed into a multiplexed stream with a flow control window https://tools.ietf.org/html/rfc7540#section-6.9 model which is semantically similar to RS request(n), but with a non zero initial window.

Finally, after all that processing, we have to actually write the data out onto the network, which we do with

public interface AsynchronousByteChannel { void write(ByteBuffer src, A attachment, CompletionHandler<Integer,? super A> handler);(); }

public interface CompletionHandler<V,A> { void completed(V result, A attachment); void failed(Throwable exc, A attachment); }

Internally, this is pretty much a chain of processors that operate semantically similar to a chain of reactive Stream processors, but without the nice rigour of the request(n) semantic. Instead of that we have a variety of completion and/or flow control messages that are roughly equivalent to a RS processor- eg onWritePossible() is could be seen as a request(1) call.

Given that a servlet container is already implemented as a flow of semantically similar asynchronous processors, extending this model to the application and beyond to services that the application consumes is very attractive. Using reactive streams would give the applications a bit of rigour and structure that is lacking in the raw asynchronous IO API and would help avoid many of the common pit falls. More over, internally we could consider replacing the multiple asynchronous paradigms with a single reactive streams paradigm for better consistency and maintainability of the code.

So that's how I ended up looking at reactive streams to extend and/or replace these existing async IO paradigms.

It can mostly be done and we've made some good progress. But the stumbling block I've hit is that all these various similar paradigms currently used by a servlet container have a method for passing back a reason with any failure: raw IO can fail with CompletionHandler.failed(Throwable,A), HTTP/2 streams can fail with error carrying frames https://tools.ietf.org/html/rfc7540#section-6.4 and the servlet API allows for an onError(Throwable) callback. So if I wish to wrap or replace these paradigms as RS, I need to work out what to do with the error reasons received and how to generate a reason when I have to call and existing API.

So one of your suggestions is that I use the pair of Flows available (from request input and response output ) as the reverse channel on which to send those errors. The suggestion is that an error processing request input could be delivered by a call to onError(Throwable) on the response output. Unfortunately that is not an option because the lifecycles of request and response are very independent. A response flow can complete before the request flow is consumed, so it may not be available as a backchannel for failures. If we look at protocols like websocket or HTTP/2 streams in general, they may be half closed in either direction, so the reverse normal data channel may not be available to pass errors on behalf of the related reverse direction channel.

Another approach that I take from your response is that we could look at other non-RS paths to deliver the failure. Perhaps we could do:

public interface ServletRequest { public void completeWith(Processor<ByteBuffer, ByteBuffer> handler, CompletionHandler<Void,Void> completion); }

OK that's an ugly example but I hope you get the idea. So sure that could be done and I think it could even be a better fit with other aspects of the servlet API. But this is only workable if the container is the sole implementer of the complete processing chain. This is against good software engineering, where it is good practise to reuse components and ideally we should be able to use off the shelf or injected processors for encoding, encryption, compression etc. So if I'm using a 3rd party reactive stream, there is no way for me to extract the reason for failure from it so that I could call completion.onError(handler).... unless I start doing hacky things like parsing logs generated by the 3rd party component!

So in summary, I'm working with a set of very similar paradigms to reactive streams, but all of them allow the back flow of errors. So I am unable to completely map these paradigms onto RSs, nor can I replace my implementation with a RS based impl, unless I resort to hacky error codes in request(n) or parsing logs!

I accept that perhaps current usage of RS may not need the error backflow, but once an API is in the JVM it is going to encounter new and unintended usages. It is so close to being sufficient to replace/extend/implement the use-case above, but needs that error backflow to be semantically equivalent to these existing paradigms.

cheers

PS. sorry for the long nature of these posts.

On 3 August 2015 at 03:11, Roland Kuhn notifications@github.com wrote:

@gregw https://github.com/gregw Thanks for providing two examples that helped me understand the difference in our reasoning: the ServletRequest class and the analogy to CompletableFuture. I’ll try to detail the difference so that we can hopefully converge on a common understanding.

First, the CompletableFuture: you compared the current nullary cancel() method to removing all arguments from the failed() method. In a chained computation described as transformations of CompletableFutures we model a strictly unidirectional flow of information from provider (the one that completes the Future) to consumer (all derived Futures or registered callbacks). The information that flows can either be a value of type T or a Throwable, disambiguated by representing them in two different completion modes—successful and failed. But it should be noted that this information only flows forward, the failure of a derived Future has no bearing on its parent or origin. Combining two Futures with anyOf means that the failure of one of them will taint the combination, but this does not in any way affect the other input—even though it is “cancelled” (i.e. its value will not matter any longer). The point I’m getting at is that RS is consistent with the CompletableFuture API in the way failure values are disseminated.

Second, and more difficult, but probably also more enlightening, is the ServletRequest example. If you model an RS-driven servlet in this fashion then you implicitly declare the presence or absence of cancellation to be the binary service call result. This means that requesting all information to be delivered to onNext() is seen equivalent to acknowledging reception when coupled with the absence of cancel(). Unfortunately this is not a reliable interpretation with the current RS specification since the Publisher could well call onComplete() after the final element has been dispatched and then be deaf to any subsequent failure—the reason is that the RS protocol deliberately does not include an acknowledgement signal. The request/cancel channel is only good enough to convey whether more elements are welcome to be sent, not more. This is the consequence of modeling a truly unidirectional flow of information; making the semantics any stronger would entail more coordination and synchronization, which means more overhead, less throughput, etc. It is quite some difference in effort to implement a back-pressured channel and a reliable acknowledged channel, RS consciously aims for the smallest useful functionality.

Within these constraints it is clear that the ServletRequest needs two flows of information: the user input may want to be incrementally acknowledged by a flow of ACK messages, or the request should get an overall response (possibly containing a result) that is generated based upon the received input. As an example of such an API Akka HTTP is modeled as the equivalent of a Processor<HttpRequest, HttpResponse>, where in both directions the payload for each message is expressed equivalent to a Publisher. This processor handles all requests that are pipelined onto one HTTP connection. Another example is the Play Framework which models an HTTP server as a function from HttpRequest to the equivalent of a Processor<ByteBuffer, HttpResponse>. In both APIs the common theme is that the server implementation consumes the client input and produces the response output—in a streaming fashion. The ServletRequest could thus be described as

public interface ServletRequest { public Publisher getInputStream(); public Subscriber getOutputStream(); }

and the application code would subscribe to the input and subscribe the output servlet output to its own output. Or the API goes the other way around:

public interface ServletRequest { public void completeWith(Processor<ByteBuffer, ByteBuffer> handler); }

This is a matter of taste, we chose the equivalent of the latter because it leaves less things to be done (wrong) by the user.

Now, what does this have to do with error-bearing cancellation? Just as with CompletableFuture all information is supposed to flow forward and by closing the loop as described we have an obvious place of where to react to failures: when the servlet somehow fails and cannot handle the request it will fail its output stream and the right things will happen—that the reason behind not wanting to read more elements is abnormal does not need to be signaled to the client on the reading channel as well, because the only thing that TCP (or any bidirectional stream protocol) can do on the reading side is to not pull it, there are no error propagation mechanisms in this direction.

The only remaining question is then why onError() carries auxiliary information. I agree that in principle it could be done without the Throwable, but the scenario when onError() is used is that the whole stream setup itself has failed and needs to be torn down ASAP and no other data elements can be shipped downstream that could allow direly needed replacement data elements to be made up—we need to generate an HttpResponse even if none came out of the Processor. This problem is asymmetric to the upstream side where all that needs to be done is to stop sending more elements and possibly releasing resources. As I said, we could just generate a 500 response in all those cases, but that is a specific ability of HTTP that other stream consumers might not have (and onError(Throwable) allows us to log failures in the framework/library instead of having the user care about that). As a concrete manifestation of what I mean you can take a look at Akka’s Flow.recover() operator that turns a failing stream into one that emits one last element before completing normally—of course this is just stolen from RxJava’s onErrorReturn (we have a ticket for also implementing the equivalent of onErrorResumeNext).

I hope this sheds some more light on the differences; my impression is that you expected the Publisher–Subscriber pair to be the only tool you need and that Viktor’s and my responses constitute or reveal an expectation violation. In any case, I’m confident that together we’ll figure it out.

— Reply to this email directly or view it on GitHub https://github.com/reactive-streams/reactive-streams-jvm/issues/271#issuecomment-127046334 .

gregw commented 9 years ago

@DougLea

Yes I've already noted that request(n) represents an upstream flow of data that can be misused for sending errors (or arbitrary data - hey if you can implement TCP/IP over DNS lookups, then you can probably do it over RS request(n) calls!)

However I'm not raising this issue because my back is against the wall. I'm raising this issue to try to help improve the quality and flexibility of an API that is proposed for inclusion in Java 9.

My concern as the developer of an application container, is that if I expose reactive streams to my user base, then some of them may end up with their backs against the wall if their required error semantic is not provided. Some of those users will then resort to hacky solutions such as negative request args, parsing logs etc.

This will invariably results in horrible clashes / failures / broken assumptions various application components and frameworks are assembled in the infinite variety of unexpected usages that the great unwashed developing public have. When it all goes horribly wrong, it is me and my team that are on the pointy end of supporting all the bizarre usages that developers come up with. If there is no solution for legally passing failure reasons, then it is very hard for us to argue against any hacky attempts to achieve a needed semantic!

cheers

On 3 August 2015 at 00:07, DougLea notifications@github.com wrote:

@gregw https://github.com/gregw Allowing cancel to send some kind of error code might have been a good idea if limited to Strings, or ietf/Unix-style numerical codes. This would require that Publishers understand such codes enough to do something with them, but I can picture arriving at a minimal set of standard ones. In the mean time, if your back is to the wall, you could use special negative sentinel values in subscription.request for these purposes with (surprisingly) the same (but much hackier) effect. I suppose that if there were agreement that this is the way to go, such values could be assigned now: request(-12345) means "data corruption". or whatever.

— Reply to this email directly or view it on GitHub https://github.com/reactive-streams/reactive-streams-jvm/issues/271#issuecomment-127027164 .

viktorklang commented 9 years ago

@gregw

True, currently if my DB driver stuffs up, then I cannot send an exception to it. However, as you say RS are not APIs but are protocols. If I'm talking to my DB via a RS then I'm in an ongoing relationship with it, and all the protocol work I've done suggests that good protocols have reasons in their error terminations.

TCP?

Most recently I have contributed to the development of HTTP2 protocol and it has reasons in its error responses.

HTTP2 is an example of a protocol that you could build on top of RS. RS is not the One Protocol To Rule Them All (tm) :)

I can't recall any protocol that I've worked with that does not support reasons in its failure termination messages.

TCP?

I have spent many years working with TCKs of software standards. They are very important tools, but they are a long way short of a formal proof of compliance.

They seem to be working for Java. I wouldn't mind having a proof assistant validating implementations, open a PR! :) Jokes aside, it's not an all-or-nothing situation, programmers in Turing complete languages can always choose, consciously or not, to shoot their and others' feet. And even if they do, they can/will blame someone else for it.

I assume the same situation can occur with the current cancel() method. The Processor will have to propagate up that cancel call to the N-way merge, at which point all 5 subscriptions to the 5 publishers will need to be cancelled.

Alright, so now they all get an exception, but whose fault was it? Who will know what to do with the exception? See the problem?

I'm only proposing that an error be passed along with that cancel. Processors may pass it along, wrap it with their own exception or replace it with their own reason.

For what purpose? To have it logged in X places? Who will know what to do with it? There is no semantics associated, not in the type of the exception nor in the data it carries, there's just no meaningful information to be programmatically used by it. I fail to see the utility of the feature. What is it supposed to solve?

Re Servlet 4.0

I think Roland's reply was really good so I'll refrain from adding more to it. Akka Http can hopefully prove as an example that it not only possible to implement a Http library on top of RS, but it also works out very nicely :)

In my experience, migrating from one style of doing things to another has the risk of trying to do things in the new thing as they were done in the old thing, which can cause more frustration than utility. (Imagine trying to use a nailgun as if it was a hammer!)

May I suggest that we slightly tilt the discussion to: How could Servlet 4.0 take the most advantage of Reactive Streams?

gregw commented 9 years ago

@vicktor

A couple of quibbles and then capitulation...

On 3 August 2015 at 18:10, Viktor Klang (√) notifications@github.com wrote:

TCP?

OK I'll give you that.... but TCP is not strongly typed (unlike RS) and it is bidirectional (unlike RS), so returning errors is really just deferred to the application layer protocol built on top of it.

So I think my point holds for most application layer protocols.

Most recently I have contributed to the

development of HTTP2 protocol and it has reasons in its error responses.

HTTP2 is an example of a protocol that you could build on top of RS.

So you don't see RS as an application protocol.... fair enough, but then it is really strange mix to have it so strongly typed if failure messages are intended to travel through the same band as data.

Also not that my thought experiment, described in a recent post, is exactly how could i implement HTTP2 within the container using RSs.... and I'm getting stuck at the error handling part.

Alright, so now they all get an exception, but whose fault was it? Who will know what to do with the exception? See the problem?

No I don't see the problem, they all get a cancel() anyway and will all log/handle that. I can't see how giving extra information that they are free to ignore makes their task any more difficult.

For what purpose? To have it logged in X places? Who will know what to do

with it? There is no semantics associated, not in the type of the exception nor in the data it carries, there's just no meaningful information to be programmatically used by it. I fail to see the utility of the feature. What is it supposed to solve?

I agree that mostly the passed reason will just be logged and rarely programmatically used. That's why an error code or string would do just as well.

But even with just logging, it is important to have that logging done externally to the RS components in a flow. If you build up a Flow from 3rd party processors, then you are at risk of having multiple logging mechanism. If you are trying to build HTTP2 impl out of RS's then you will either have to intercept those various logging mechanisms or not use those 3rd party processors that might do cool things like async encyption/parsing/encoding.

I'd almost be satisfied if every RS component was just give a log API as part of the standard so that implementations would not have to log themselves, but could pass responsibility for logging to who ever had aggregated the various components into a single flow.

I don't want my GzipProcessor to come with it's own logging mechanism choice!

May I suggest that we slightly tilt the discussion to:

How could Servlet 4.0 take the most advantage of Reactive Streams?

Very good suggestion! But how about we put this issue on hold for a bit and discuss 4.0 in a new issue. Once we've rattled around there we can come back and either close this one or pick it up again with more understanding?

I'll open another issue and give some descriptions of my initial thoughts on the matter and stand by to have them corrected.

cheers

viktorklang commented 9 years ago

@gregw

Slightly out of order:

Very good suggestion! But how about we put this issue on hold for a bit and discuss 4.0 in a new issue. Once we've rattled around there we can come back and either close this one or pick it up again with more understanding?

I'll open another issue and give some descriptions of my initial thoughts on the matter and stand by to have them corrected.

Sounds fantastic, let's do that :) I'm glad we're having this conversation!

OK I'll give you that.... but TCP is not strongly typed (unlike RS) and it is bidirectional (unlike RS), so returning errors is really just deferred to the application layer protocol built on top of it.

If you squint, TCP is like Pair<Publisher<ByteString>, Subscriber<ByteString>>.

So you don't see RS as an application protocol.... fair enough, but then it is really strange mix to have it so strongly typed if failure messages are intended to travel through the same band as data.

I see RS as an in-process transport protocol. (reactive-streams-io project would extend this to out-of-process as well). Currently it is possible to piggyback on TCP flow control but with a dedicated reactive-streams-io layered protocol it would be possible to sandwich it on top of many more (SCTP, QUIC etc).

No I don't see the problem, they all get a cancel() anyway and will all log/handle that. I can't see how giving extra information that they are free to ignore makes their task any more difficult.

What I'm saying is that there's not much most (all?) of them could do with it, which means that it is sent in vain. Complecting the implementations for everyone for utility of very few seems like a losing proposition (to me).

Would you argue that InputStream.close() should take an error code as argument?

I agree that mostly the passed reason will just be logged and rarely programmatically used. That's why an error code or string would do just as well.

Let's say it would take an error code / enum, what would it be?

But even with just logging, it is important to have that logging done externally to the RS components in a flow. If you build up a Flow from 3rd party processors, then you are at risk of having multiple logging mechanism. If you are trying to build HTTP2 impl out of RS's then you will either have to intercept those various logging mechanisms or not use those 3rd party processors that might do cool things like async encyption/parsing/encoding.

How is this any different from using multiple libraries today? SLF4J vs lava.jogging vs System.out/err vs X?

I don't want my GzipProcessor to come with it's own logging mechanism choice!

Which is completely fine, because it can cancel the upstream and send an onError(exceptionOfYOurChoice) downstream. No logging required! :)

Cheers, √

rkuhn commented 9 years ago

@gregw Don’t worry about the length of the posts, some redundancy is necessary to compensate for the restricted bandwidth of written communication—I for one am learning a lot from your explanations.

The example of the AsynchronousByteChannel and its CompletionHandler makes it a lot clearer, sorry for confusing things with CompletableFuture earlier. When talking to a socket it makes some sense to signal that the bytes didn’t even make it into kernel land (for example), so now I understand what you mean to achieve with failed (or WriteListener.onError) in this direction. Extending our cancel method to directly map to this feature leaves you with a remaining hole, though: the completed method signals that a write has been “committed”—whatever that means—and such a signal is not available in RS at all. That was what I meant with the difference between back-pressure and reliable delivery, one only needs an asynchronous demand signal, the other a more real-time (but not strictly synchronous) delivery confirmation. I’m making a guess here, but possibly the most prominent reason for not considering this within the RS group was that successfully writing to a TCP socket does by no means imply that the recipient will actually ever see those bytes, so the value of this confirmation is judged low; personally I’d rather have a less powerful but reliable implementation than a more powerful “best effort” one.

Having said that, I see now that being inspired only by network protocols might be too limiting: within an application that uses a shared address space it is not quite as unreasonable to build in a confirmation mechanism. However, if we were to do that then adding an abnormal cancellation signal would not be enough, we would need to signal successful reception for all elements as well in order to offer the full package. Doing only one half will trick users into believing that the absence of cancellation implies successful reception—which is false—and we have some experience with how these psychological processes work. Building in confirmations would lead to a different protocol, though, with different performance characteristics and implementation complexity. Considering this, do you think that making RS suitable for confirmed delivery is worth it, or am I worrying too much about doing only the error side of it?

Another side point that I’d like to touch upon is the strongly typed nature vs. validation errors. When you worry that «the more processing I consider that the container does, then the more failures that can occur within the container», you imply that e.g. the inability to parse an incoming ByteBuffer as UTF-8 code points should be treated as a failure (please correct me if wrong). It is tempting to offer a Processor<ByteBuffer, String>, but please consider that this signature is a lie in that it promises to turn any sequence of bytes into a sequence of valid Strings. Receiving malformed input should not be considered a service failure, it should lead to a validation error, where the difference lies in who is responsible for fixing it. In this example it would be much more honest to model the parsing stage as Processor<ByteBuffer, Validation<String>>, offering the consumer the choice of how to respond to errors without forcing them into failure mode.

Analog treatment applies to other cases where validation is necessary because the data types do not guarantee the desired semantics—the other example you raised was that serializability is not usually accurately expressed in the type system (although it is possible to do so, but that is a different discussion).

gregw commented 9 years ago

@Roland,

Your latest message is interesting enough that I'll comment here before moving to a new issue to discuss 4.0.

I totally agree with your point of view regarding confirmation and I'm definitely not seeking confirmation. Even when the current layers I'm working with offer a completion() callback, that completion is not saying that the data has been sent - only that the data has been consumed (eg copied into an aggregation buffer). Delivery confirmation is definitely something that has to be done in higher level protocols (as even being read in the remote ends read buffer does not always mean data has been "delivered").

However, what I find most interesting is your comment that a Processor<ByteBuffer, String> is essentially a lie. This goes a huge way to explaining why you guys think that the publisher can do no wrong! You are saying that the entire contract is bound in the type that can be published and that no externalities can result in any assumptions on the instances that can be carried.

I'm not sure I'm sold on that, eg. If a DB provides a Publisher as the result of a query, surely there can be external assumptions put on the actual instances passed via onNext and the publisher is not free to send any arbitrary instance that matches the type signature? So my problem is still looking like a nail even though the only tool I have is a staple gun:)

But maybe that question is best explored with some concrete discussions in the context of how 4.0 can be supported... Give me a day or two to start that discussion.

@Vicktor:

it can cancel the upstream and send an onError(exceptionOfYOurChoice) downstream. No logging required! :)

Glad there is a smiley there... as I thought you were the one worried about telling others about exceptions that were meaningless to them.

How is this any different from using multiple libraries today? SLF4J vs lava.jogging vs System.out/err vs X?

The problem with logging is that there are too many options, so that often the best solution is to create your own logging abstraction which can then map to other logging mechanism... all problems in computer science can be solved by another layer of indirection.

But seriously, I see that utility Processors should not more have to have logging in them that ArrayList and String. If you want to build software from components, then the less dependencies those components have the easier it will be. So my problem in this issue can be summed up as how to get a failure out of a Processor without using logging.... let's examine the options in the soon to be opened 4.0 issue.

cheers

On 3 August 2015 at 19:36, Roland Kuhn notifications@github.com wrote:

@gregw https://github.com/gregw Don’t worry about the length of the posts, some redundancy is necessary to compensate for the restricted bandwidth of written communication—I for one am learning a lot from your explanations.

The example of the AsynchronousByteChannel and its CompletionHandler makes it a lot clearer, sorry for confusing things with CompletableFuture earlier. When talking to a socket it makes some sense to signal that the bytes didn’t even make it into kernel land (for example), so now I understand what you mean to achieve with failed (or WriteListener.onError) in this direction. Extending our cancel method to directly map to this feature leaves you with a remaining hole, though: the completed method signals that a write has been “committed”—whatever that means—and such a signal is not available in RS at all. That was what I meant with the difference between back-pressure and reliable delivery, one only needs an asynchronous demand signal, the other a more real-time (but not strictly synchronous) delivery confirmation. I’m making a guess here, but possibly the most prominent reason for not considering this within the RS group was that successfully writing to a TCP socket does by no means imply that the recipient will actually ever see those bytes, so the value of this confirmation is judged low; personally I’d rather have a less powerful but reliable implementation than a more powerful “best effort” one.

Having said that, I see now that being inspired only by network protocols might be too limiting: within an application that uses a shared address space it is not quite as unreasonable to build in a confirmation mechanism. However, if we were to do that then adding an abnormal cancellation signal would not be enough, we would need to signal successful reception for all elements as well in order to offer the full package. Doing only one half will trick users into believing that the absence of cancellation implies successful reception—which is false—and we have some experience with how these psychological processes work. Building in confirmations would lead to a different protocol, though, with different performance characteristics and implementation complexity. Considering this, do you think that making RS suitable for confirmed delivery is worth it, or am I worrying too much

about doing only the error side of it?

Another side point that I’d like to touch upon is the strongly typed nature vs. validation errors. When you worry that «the more processing I consider that the container does, then the more failures that can occur within the container», you imply that e.g. the inability to parse an incoming ByteBuffer as UTF-8 code points should be treated as a failure (please correct me if wrong). It is tempting to offer a Processor<ByteBuffer, String>, but please consider that this signature is a lie in that it promises to turn any sequence of bytes into a sequence of valid Strings. Receiving malformed input should not be considered a service failure, it should lead to a validation error, where the difference lies in who is responsible for fixing it. In this example it would be much more honest to model the parsing stage as Processor<ByteBuffer, Validation>, offering the consumer the choice of how to respond to errors without forcing them into failure mode.

Analog treatment applies to other cases where validation is necessary because the data types do not guarantee the desired semantics—the other example you raised was that serializability is not usually accurately expressed in the type system (although it is possible to do so, but that is a different discussion).

— Reply to this email directly or view it on GitHub https://github.com/reactive-streams/reactive-streams-jvm/issues/271#issuecomment-127177147 .

rkuhn commented 9 years ago

@gregw Great, so my conclusion is that we fully agree on the issue of confirmation, I was slightly worried by the completed method’s implications. I also agree that we have now established a common understanding of what the actual problem is, where parts will need to be tackled within the scope of Servlet 4.0 and other parts may need to be tackled in Reactive Streams; having that discussion after figuring out where the dividing line will be drawn sounds good to me.

Concerning the case of emitting too loosely typed values from a database my recommendation would be to add a validation Processor immediately after it that turns the stream into a more appropriately typed one according to the use-case—where validation errors are henceforth described in-band just like in the String parsing case. This Processor then acts like a merge element that combines the database outputs with their external assumptions.

viktorklang commented 9 years ago

Glad there is a smiley there... as I thought you were the one worried about telling others about exceptions that were meaningless to them.

I wasn't joking though. The downstream can, if it is able and willing, attempt to recover from the failure See 4.2, an upstream cannot, as the downstream is now torn down. (Compare with how exceptions are thrown to the code that comes after it (the catch block), not the code that came before it (the try block).)

The problem with logging is that there are too many options, so that often the best solution is to create your own logging abstraction which can then map to other logging mechanism... all problems in computer science can be solved by another layer of indirection.

Sure, but that problem, however, is completely orthogonal to the question at hand, wouldn't you agree? (It happens without Reactive Streams)

But seriously, I see that utility Processors should not more have to have logging in them that ArrayList and String. If you want to build software from components, then the less dependencies those components have the easier it will be. So my problem in this issue can be summed up as how to get a failure out of a Processor without using logging.... let's examine the options in the soon to be opened 4.0 issue.

I definitely see situations where processors may want to do debug logging, or otherwise. The RS spec doesn't prescribe anything but: "the caller MUST raise this error condition in a fashion that is adequate for the runtime environment.".

Anyway, let's continue the Servlet 4.0 discussion over at the appropriate Issue.

Cheers, √

abersnaze commented 9 years ago

@viktorklang The problem with using Subscriber<ByteString> as the output is that there isn't away for the handler to know if the write succeeded. The API for RxNetty something like this:

public interface Handler<I, O> {
    Publisher<?> handle(Publisher<I> request, Response<O> response);
}
public interface Response<O> {
    Publisher<Void> writeAndFlushOnEach(Publisher<O> output);
    ...
}

Subscribing to the Publisher<Void> it allows the handler to handle each emission failure in it's own way and lets say rollback any changes if the response can't be written. The handler returns the Publisher<?> so the server knows when all writes are completed.

gregw commented 9 years ago

@abersnaze

Ah! that's a bit of a Light bulb moment for me!

I'll discuss that here a bit as it is more relevant to my issues about handling failure than designing for 4.0

Returning Publisher from the setup of a functional data Flow gives the ability to return errors by calling onError(Throwable) on the subscriber (I assume the void publisher can act as a CompletableFuture in this regard and there is no race to subscribe before the exception occurs).

Ie the reverse channel for passing errors need not be the reverse data channel (ie the input associated with the output), but can be a dedicated Publisher that is acting like a CompletionHandler

Also in that case it may actually be simpler to do:

public interface Handler<I, O,A> {
    void handle(Publisher<I> request, Response<O> response, A attachment, CompletionHandler<Void,A> completion);
}

Although I guess that is mixing styles and I've always found the attachment superfluous

Sticking with the one style, can you explain why <?> is used rather than in your handle method? Is onNext(Object item) ever called? Isn't onCompleted() from a Publisher sufficient for that?

So my final thought bubble, jumping way ahead of any answers is: perhaps we could avoid my logging concerns and generalize this pattern by having adding an extra method to Subscriber:

public interface Subscriber<T> {
    public default Publisher<Void> getCompletionPublisher() {return null;}
    ...
}

Subscribers that wanted to report completion and/or errors could provide a publisher, while those that don't would stick with the default null return. Publishers (or Flow creators) can subscribe to this if they are interested in completion and/or failures, if only to log it in a common logger.

Anyway, I think my thought bubble is getting way ahead of my understanding... so I've opened #288 for further discussion.

viktorklang commented 9 years ago

@gregw Fortunately Subscriber does not need modification—there's already Processor—any Subscriber that is able and willing to signal completion will be a Processor[T, Void].

gregw commented 9 years ago

@viktor,

close! but what about a Processor that wants to signal completion. Anyway, let's ignore the thought bubble until we've made progress on the concrete examples in #288

cheers

On 4 August 2015 at 16:30, Viktor Klang (√) notifications@github.com wrote:

@gregw https://github.com/gregw Fortunately Subscriber does not need modification—there's already Processor—any Subscriber that is able and willing to signal completion will be a Processor[T, Void].

— Reply to this email directly or view it on GitHub https://github.com/reactive-streams/reactive-streams-jvm/issues/271#issuecomment-127495241 .

maxwindiff commented 9 years ago

@gregw I have been thinking about the argument for bubbling exceptions upstream for some days, not sure if I'm too late but let me add my thoughts here.

It seems the main motivations are (1) try to make the publisher correct its own error, or (2) report the error so as to get some human attention. I think sending exceptions upstream cannot actually achieve these goals.

First of all consider (1). If a publisher receives an error from one of the subscribers, what can it do? If the error is the result of programming error, human intervention is usually necessary. If the error is a data issue, e.g. input to the publisher is corrupt, the publisher could probably recover by reloading the source. But successful recovery also requires cooperation between the publisher and subscriber(s), e.g. to tell subscribers to discard data up to a certain point. If such a control channel exists, it would be a more appropriate place to send exceptions upstream since these exceptions are essentially control messages.

For (2), I'm not sure if sending an exception alone will be of much value. If the publisher and subscriber belong to the same people, logging the error at subscriber side would be sufficient. If the publisher and subscriber belong to different organizations, you'll probably need some human to create a much more detailed error report in order to convince / help the publishing folks fix their bug – reproduction steps, expected vs actual results, etc. I guess a close analogy is, think of what you would do if some REST API you use start returning incorrect results. An exception alone won't do much.

gregw commented 9 years ago

@maxwindiff

Never too late! I'm happy to run around my own private hamster wheel on this one :)

I think the argument that it is difficult to handle exception could be made about any use of exceptions. They are rarely handled programmatically and often need human intervention to interpret. So if those arguments are true for Flows then they would be true for arbitrary APIs, protocols and REST APIs.

Note also that just logging the reason means it can't be handled programatically and almost certainly needs human intervention. All I'm really saying is that we should move the logging out of the Flow components and at least open the possibility of programatic handling.

If the publisher and subscriber belong to the same people, they still might be constructed by assembling 3rd party components, so they still may not have a common logging mechanism. If the publisher and subscriber belong to different organisations, then in general, providing a reason for failure (and a reason may be a code, string or exception), has proved to be of value, even if it cannot be programmatically handled.

Taking your REST API example, imagine if your REST client just silently failed without making available the reason, so you would not know if it was : marshalling of the arguments, host resolution, network failure, timeout, authenticaton error, remote service problem, or marshalling the response. Imagine if the only way you could diagnose the problem would be to contact Google, Amazon, Ebay or who ever was providing the rest service and ask to inspect their server logs so you can see the reason for the failure!

Instead, failures are reported to the client and logged on the server. Both parties are informed and both can independently make an assessment of who is at fault. If there is a bug in Google/Amazon/Ebay, then they support staff will hopefully detect the problem in their logs and fix it. If the problem is with the client, then the reported error will help the developer diagnose the problem. In the case that Google/Amazon/Ebay don't spot their bug, then having a reported reason for failure will help the developer open a support request on the service provider.

So perhaps cancel(Throwable) is not the best way to report a failure reason, but I still believe their needs to be one as logging from within components is not reasonable (imagine if ArrayList used java logging, ConcurrentLinkedList used Log4j and HashMap used commons logging!)

There are some creative ideas about how Void Producers might be able to be used as a conduit for such failures, but I maintain that this cannot be done in an adhoc way, otherwise it will be difficult to build applications from compositions of 3rd party components.

So while I'm open to different ideas for delivering reason for failures out of components (and look forward to discussing options in the other issue) I believe a technique should be part of the standard, at least as an optional mechanism, so that reusable components can be built that use it.

cheers

sbordet commented 9 years ago

@rkuhn

Another side point that I’d like to touch upon is the strongly typed nature vs. validation errors. When you worry that «the more processing I consider that the container does, then the more failures that can occur within the container», you imply that e.g. the inability to parse an incoming ByteBuffer as UTF-8 code points should be treated as a failure (please correct me if wrong). It is tempting to offer a Processor<ByteBuffer, String>, but please consider that this signature is a lie in that it promises to turn any sequence of bytes into a sequence of valid Strings. Receiving malformed input should not be considered a service failure, it should lead to a validation error, where the difference lies in who is responsible for fixing it. In this example it would be much more honest to model the parsing stage as Processor<ByteBuffer, Validation<String>>, offering the consumer the choice of how to respond to errors without forcing them into failure mode.

But the problem with using a Validation class is that, for example, in Java, there is no standard Validation class that can be used (unless defined by RS). So a Processor library writer will use its own, and another Processor library writer will use its own, and now you have a dependency between two libraries, and you have to convert between the twos, etc.

I like the idea to have the Subscriber return a Publisher that emits events in the opposite direction with respect to those that the subscriber is receiving:

public interface Subscriber<T> {
    public default Optional<Publisher<T>> getFeedBack() { return Optional.empty(); }
    ...
}

The subscriber's Publisher can feed back to the original Publisher when the items have been processed asynchronously (and this allows for efficient pooling of those objects by the original Publisher, for example), as well as detailed error delivery via onError(Throwable).

Original Publisher -- events --> Subscriber
           ^                       |
           |                       |
           `------ feed back ------'

Makes sense ?