googleapis / google-cloud-cpp

C++ Client Libraries for Google Cloud Services
https://cloud.google.com/
Apache License 2.0
546 stars 364 forks source link

Consider making asynchronous operations cancelable. #1289

Closed coryan closed 4 years ago

coryan commented 5 years ago

Asynchronous operations can be canceled by the application, or at least, the application can request that the operation be canceled, the cancelation may or may not succeed.

To implement this we would need to return an object (maybe of AsyncOperation type) from each member function that implements an asynchronous operation. Then the object would cancel the request (using the underlying gRPC cancelation mechanisms), or the timer if the operation is in a retry loop.

It is unclear if there is much value on this: most operations succeed on the first request, so there is little opportunity to cancel them.

It is also complicated to signal to the application that the cancelation completed, specially for operations like AsyncBulkApply that may have partially succeeded before the cancelation is issued.

Finally it is not clear how that would work with std::future<>-based requests (how I am planning to implement async operations for exception-enabled builds), because it is not possible to cancel via a future.

My vote is to not provide a mechanism to cancel the request. I would like to hear other opinions.

If we converge on not having a mechanism to cancel requests then we would create an ADR to document this decision.

If we converge on having a mechanism to cancel request we would need to open bugs to implement it.

coryan commented 5 years ago

@deepankarsharma @dopiera @sduskis I am interested in your thoughts about this bug.

coryan commented 5 years ago

@houglum because what we decide here will influence our decisions for Cloud Storage someday, also interested in your thoughts.

dopiera commented 5 years ago

Here are my 10 cents:

I tried to come up with scenarios when one would need cancellation; I'm not sure how real-life these examples are (probably not much), but this is what I figured:

By making the decision not to implement cancellations, we'll be making it much harder for users to reimplement the routing policies if they wish (maybe that's OK). The other examples seem to boil down to answering the question whether we consider setting long deadlines an anti-pattern. If we do, then we are basically shifting the responsibility to do back-offs to the user, which sounds like an inconvenience to me.

The other thing is what semantics cancellation has. It seems that there is an inherent race with cancellations - what if we issue the cancellation right after the request manages to complete. Due to that, I can't come up with anything other than making cancellation advisory, i.e. the client is free to ignore it. If we treat it as such, we could make cancellation completely async with no indication of completion.

We could start with a simple implementation: it would only cancel the requests which are waiting to be sent. The user would get to know by checking the individual request status upon their completion (one of them would be CANCELLED). For bulk mutations, it doesn't seem a problem unless the mutations are on the server side and if they are, tough luck - cancellation is likely to be ignored, unless it refers to all requests stuck on the server side.

If we go this way cancellation would be an ordinary exception, so std::future should just work, no?

Having said all of that, I'm not sure whether implementing cancellations is worth it. Maybe the way to go is in between - return a cookie for every operation, so that we can retrofit cancellation later.

dopiera commented 5 years ago

FWIW, I think Linux signals are an analogy: when you do write(), you can interrupt it, unless you're already deep in the kernel in an uninterruptible sleep.

coryan commented 5 years ago

The most compelling argument you make is that removing cancelations takes away functionality from the application developers. There are algorithms that one can implement with asynchronous gRPC calls that become hard or impossible to implement if we do not have a hook to cancel the operation. As a general principle we try to offer a more convenient API, but one that is at least as powerful as the raw gRPC calls.

That I think convinces me that we need to support cancelation, the issue is how? What changes are needed in the APIs outlined in the go/cbt-cpp:async-dd (you have access to the direct link, but not the go/ link) design doc?

The next questions are in that spirit.


If we go this way cancellation would be an ordinary exception, so std::future should just work, no?

The question is what should an asynchronous function with futures return? In the design doc, I outlined APIs like this:

future<Foo> f = table.Operation(...);
// Use f.get() to block for a value or f.then() to set a callback.

Now, neither std::future<>, nor std::experimental::future<> have a "cancel" member function, so where is the application supposed to say "cancel"?

For the noex:: APIs where we are returning void it is not hard to see us returning a different thing. I guess we could return a std::pair<> with a future and the cancelation object?

Or we could "enhance" the future<> class to support a cancel() request? I do not like that idea because I want a class that can be replaced with std::experimental::future<> once that becomes widely available (2020 is not that far into the future).

For bulk mutations, it doesn't seem a problem unless the mutations are on the server side and if they are, tough luck - cancellation is likely to be ignored, unless it refers to all requests stuck on the server side.

The issue(s) with bulk mutations are two fold (both minor for this discussion I think):

Thoughts?

dopiera commented 5 years ago

If we go this way cancellation would be an ordinary exception, so std::future should just work, no?

The question is what should an asynchronous function with futures return? In the design doc, I outlined APIs like this [...]

The part which I meant here was indicating the cancellation, along the lines with your response - "operation cancelled" would be one of the possible failures.

I guess we could return a std::pair<> with a future and the cancelation object?

I was actually hoping that we could somehow use the shared state from the future as a way to address the related request so that you could have a AsyncTable::CancelOperation method, which would accept a future<> as a handle. The pair<> was my second choice, but TBH I think it's a bit ugly. I spent some considerable amount of time trying to figure out if there's a clean way to implement the CancelOperation, but I only had hacky ideas.

A slight improvement to std::pair is to have a custom object - it might be more convenient than using opaque first and second.

I do not like that idea because I want a class that can be replaced with std::experimental::future<>

I understand the point, but I think it's worth noting that that's how Java solves it - by having cancel() on their Future.

Yet another approach would be golang way - make the contexts cancellable. Unfortunately. the gRPC contexts don't support it, so we'd have to wrap them, and manually track request to context mapping. I'm not sure I'm a big fan of this approach, because it doesn't seem in line with everything else in the API.

Having put more thought into it, I think there is one more caveat: cancelling a chain of requests. The cool part about the std::future<>::then() is that you can be left with just one future holding the whole chain (if my understanding is correct). Getting the cancellation object outside of the continuation might be tricky, which IMO is an argument for either improving the future<> or the golang way.

It seems I'm asking more questions than giving answers, maybe I should stop :). I guess given these choices, I'm leaning towards the golang way (deadlines are in the context too after all).

A different API design question (but probably less important): the Apply() and BulkApply call use move semantics to minimize the number of copies from the application down to the gRPC layer, some of these mutations are large sometimes. Move semantics are Okay when things succeed: "I stole your data buffers but I did something useful with them" is a reasonable excuse, but when we signal a failure/cancellation: should we give the buffers back to the application ("I was borrowing these buffers, but now I cannot use them, take them back").

Is there any argument against it? I imagine, we'd return those buffers together with the errors.

houglum commented 5 years ago

I don't have much to contribute here, except for a couple things:

A slight improvement to std::pair is to have a custom object - it might be more convenient than using opaque first and second.

I agree with this, as it's more extensible and cleaner (IMO) than a std::pair.

The blocking API returns a std::vector<FailedMutations>

This model sounds most intuitive to me. Maybe we could return something similar, but instead of FailedMutation, maybe MutationResult (or something appropriately named) objects that have a status enum to disambiguate between Failed vs Cancelled/Unknown state?

coryan commented 5 years ago

Since we are converging on making the operations cancelable then the question is only around the API for future<> based requests:

Having put more thought into it, I think there is one more caveat: cancelling a chain of requests. The cool part about the std::future<>::then() is that you can be left with just one future holding the whole chain (if my understanding is correct).

Well, it is unspecified what you have... you have a future that will (eventually) get satisfied when the chain is. How is that implemented is not specified at all.

Getting the cancellation object outside of the continuation might be tricky, which IMO is an argument for either improving the future<> or the golang way.

I agree. Maybe we should write a paper to improve std::experimental::future<>, that is the point of technical specifications really, to gain experience and gather this type of requirements.

But I digress.


In summary, @dopiera and @houglum changed my mind, we need to be able to cancel requests. Therefore we need to:

https://github.com/GoogleCloudPlatform/google-cloud-cpp/blob/3f96177a48442982e711175ba84e0ac1537c9f76/google/cloud/bigtable/internal/async_retry_unary_rpc.h#L72

I think we continue to work on the existing async functions, the cleanup to return AsyncOperation will be pretty once AsyncRetryUnaryRpc<> is ready.

Separately, prototype the idea of adding a cancel() member function to the future<> classes in my branch. If it is possible to add such a member function then the design doc needs to be updated to say that. Otherwise we need to change the design doc to say we return both the future and the AsyncOperation object.

deepankarsharma commented 5 years ago

The most common use case I have seen for cancellable async operations is handling scroll events in UIs. When building a data viewer it is fairly common to cancel any existing prior requests in flight and then issue a new one when the scroll position changes.

My preference would be to not diverge from std::future / std::experimental::future and instead do the clunky thing for now and return another object along with the future that supports cancellation.

coryan commented 5 years ago

/cc: @devjgm FYI. I am not actively working on this anymore, so removing myself.

dopiera commented 5 years ago

While looking at API options for flow control, I encountered an interesting way to do cancellation.

Summary of the problem:

We've so far considered these options:

The first option gets ugly, when future<>::then() is used because it's tricky to get the cancellation object from any async call other than the outermost.

The second options has the big disadvantage of being non-standard.

We could do something along the lines of CancellationToken from C#.

This is how the code could look like:

void AppCode(CompletionQueue &cq) {
  CancellationToken ct = table.CreateCancellationToken();
  std::future<void> apply_chain = table.AsyncApply(GenerateMutation(), cq, ct).then(
      [&cq, ct](std::future<void> fut) {
          fut.get();
          return table.AsyncApply(GenerateMutation(), cq, ct);        
      });
  // I changed my mind - I don't want it to complete.
  table.Cancel(ct);
}

@devjgm @coryan What are your thoughts? Shall I implement it?

coryan commented 5 years ago

That sounds cool, though I would prefer if the cancellation token was created and used with the completion queue.

My guess is that we will prioritize the conversion to use StatusOr<> in the bigtable client for now.

coryan commented 4 years ago

FWIW, the future<T> class gained a .cancel() function a while back. I think that fixes this problem. We can reopen if I happen to be wrong.