googleapis / google-cloud-java

Google Cloud Client Library for Java
https://cloud.google.com/java/docs/reference
Apache License 2.0
1.89k stars 1.06k forks source link

Pub/Sub: Improve design of synchronous pull #1661

Closed garrettjonesgoogle closed 7 years ago

garrettjonesgoogle commented 7 years ago

Synchronous pull is currently supported in the SPI layer on SubscriberClient. This support was retained at the request of @evmin in https://github.com/GoogleCloudPlatform/google-cloud-java/issues/1157 .

The status quo is that there are two places to get messages, either using the synchronous pull on SubscriberClient or using the Subscriber class which does a lot of the subscription management for users.

The problem with this situation is that a user might only find one of those two ways to get messages and not see the other one, which is very suboptimal. We would like to consider if we can make the surface clearer.

garrettjonesgoogle commented 7 years ago

The raw SubscriberClient surface for streaming pull involves these three methods:

There is no explicit negative ack; negative acks are implicit if there is no positive ack before the current ack deadline.

Now if I understand correctly, one of the arguments for supporting synchronous pull is the need for a user of pubsub to precisely control when the pull call is issued. @evmin, the question I have for you is this: For the existing code you are supporting, do they need to have full control over when modifyAckDeadline is called also? Or, would the following surface be preferred?

Behind the scenes, the library would keep track of the pulled messages which have not yet been acked (positively or negatively), and repeatedly call modifyAckDeadline until they were acked.

Let us know!

evmin commented 7 years ago

@garrettjonesgoogle

Thank you for checking in.

Will be sufficient, thank you (just to be paranoid - pull is synchronous with return immediately set to false, risk of never returning is OK).

garrettjonesgoogle commented 7 years ago

Ok, and to check the converse: would it be disruptive if you had to handle the ack deadlines yourself?

evmin commented 7 years ago

@garrettjonesgoogle

Currently we call either acknowledge for the positive ack, or modify deadline set to 0 to indicate a negative ack. I have also exposed the functionality to disable the implicit acks and managing the deadline arbitrary, but that has not been used by any of our solutions as yet.

So if I understood you correctly and you mean extending the deadline explicitly when required - then it would not be a problem at all.

garrettjonesgoogle commented 7 years ago

I'm thinking that since this use case is meant to serve frameworks that do more of their own timing management, it might make sense to keep it simple and leave everything up to the caller instead of having additional smarts. That would then imply exposing all three methods, with no implicit work done.

evmin commented 7 years ago

That would be perfect, thank you.

garrettjonesgoogle commented 7 years ago

@evmin , there have been more internal discussions on this topic and I wanted to update you on our current plan. TL;DR: we would like users of synchronous pull to use raw gRPC.

Long story: the client in google-cloud-java will be aimed at two use cases:

  1. People getting started with using Pub/Sub directly
  2. People needing extremely high-throughput Pub/Sub which they will integrate with directly

For both use cases, the client needs to expose the most performant use case as the most trivial way to use the library. Exposing multiple ways to do things defeats this purpose.

The framework integration use case is not a target use case for the client. Thus, we recommend dropping down a level and coding directly to the gRPC layer. End users of these frameworks will not be impacted by this, only framework maintainers.

To mitigate the impact of a less usable layer, we intend on writing some samples which show how to get started with Pub/Sub gRPC, and how to make use of GAX for things like retry logic.

As always, let me know what you think.

evmin commented 7 years ago

@garrettjonesgoogle

I think I might understand where you are coming from. And generally agree on the intent - it does look like a sound idea and good approach.

The devil is, as usual, in the details. If the gRPC API is simple enough and the required outcomes can be achieved - that would be fantastic. Yet that's where I am probably not in the position to comment right away - I have little experience with gRPC to make the call (for instance I am completely oblivious of "GAX for things like retry logic" implications). I would need to see the final implementation to understand.

Extra documentation should take the edge off - thank you for considering it.

Yet there are a few considerations I would like to voice.

From my perspective, the library is not a mere tech tool - it is also a pathway to driving adoption as well. So the easier it is for the developers to use it - the wider PubSub and GCP in general would go. I think you might have already recognised this fact - if I interpret the presence of the easy API right. So this is the point number one.

The point number two would be the skill of the developers. I would not expect everyone to be at the Google level. So the more Google is able to assist these guys, to take the complexity away, the better it is for everyone. If there is a particular concern that can be taken care of by either Google devs or the end use devs, I would opt for the former.

The framework integration use case is not a target use case for the client.

Well, the case I am trying to talk to from the very early on is that if something can be bent and used in a certain situation, it will be. Regardless of the original intent. I have seen ETL workflows implemented on an ESB.

So frameworks will happen either way - simply because the dev teams prefer to stay away from the boilerplate code as much as possible ( hence the existence such things as Spring jdbcTemplate and ORM frameworks, for example). Especially in the general apps and analytics areas, where devs are particularly averse to reinventing the wheel.

But, it is OK. The well designed API should be enough. No special treatment is required. The API should be flexible and not to exclude options deliberately.

One extra point I would like to touch base on is the target use cases and roadmaps. I understand there are heaps of discussions internally. Unfortunately these do not transpire to a clear outside communications.

Marketing message outside Google positions PubSub as one of the BigData technologies accessible by everyone - which is great. But that also create a certain liability for the technology to support this message. BigData in the corporate world is the area where the incoming technology needs to work together with the incumbent and be adaptable. To give you an example from the BigQuery world. We use it and quite a lot. How many solutions are using the direct API? One out of roughly 20. All the rest are through a framework.

Again, the gRPC API might be super easy. Combined with the docs and example it could very well be a walk in the park. I just wanted to provide my 2 cents - from the corporate world.

aparod commented 7 years ago

To mitigate the impact of a less usable layer, we intend on writing some samples which show how to get started with Pub/Sub gRPC, and how to make use of GAX for things like retry logic.

Any update here? It seems Google is discouraging use of the older Pub/Sub library but I cannot migrate away from it without a way to perform synchronous pulls.

0xIAN commented 7 years ago

I just happened upon this page after a pretty exhaustive search while trying to figure out how to use PubSub in a synchronous mode. I wanted to add my comments and perspective on this topic.

I am a full-on GCP customer, running my company's infrastructure completely within it. I had previously been using the older PubSub API in a synchronous mode. When I upgraded, and discovered all the (from my perspective, fairly undocumented) changes, I tried to figure out how to make it work under the new API. I didn't see any examples, didn't figure it out on my own, and decided to rip it all out and replace it with some temporary workarounds while I did more research.

I made my way back around to trying it again this week. After hours and hours of searching, this is the first page I've found that basically says "yeah, we don't like that whole synchronous thing so we aren't going to support it in the PubSub API". There are two things that concern me deeply about this as a GCP customer. First, this message should be put out there, front and center. I shouldn't have to spend all this time working to figure out your stance on something as trivial as a synchronous interface to PubSub. Second, while your reasoning may make sense within your internal discussions within Google, you shouldn't ever expect to understand why a customer wants to do something. You need to give the customer the flexibility to do something as trivial as use PubSub in a synchronous manner without you deciding that they shouldn't do it that way. You don't know the customer's code, what they are trying to do. If you think you know better, and they shouldn't do it that way, then you suffer from a lack of creativity and simply aren't customer focused. And at that point, it's not a technical issue. It's a business relationship issue.

And, it is. As a result, I've had to look at other alternatives. And guess what, I can do what I want under AWS more easily. Do I want to move all my infrastructure to AWS over this? No, I don't. But that it's a point for consideration should give some pause. It's also told me that I should think very carefully about coupling my product to any Google-specific services.

Stop over-thinking this. Give the customer the capability of doing simple synchronous calls from the PubSub API. Document it. If you don't want to do that, then document how else you expect the customer to do it. If you don't want to do that, then document that you don't support it, period.

Thanks for listening to a frustrated customer.

garrettjonesgoogle commented 7 years ago

cc @haakonringberg @kir-titievsky we're receiving more comments on this issue.

kir-titievsky commented 7 years ago

So: what's a good fix for this?

On Fri, Jun 9, 2017 at 11:30 AM, Garrett Jones notifications@github.com wrote:

cc @haakonringberg https://github.com/haakonringberg @kir-titievsky https://github.com/kir-titievsky we're receiving more comments on this issue.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/GoogleCloudPlatform/google-cloud-java/issues/1661#issuecomment-307421166, or mute the thread https://github.com/notifications/unsubscribe-auth/ARrMFu03YyHE5hdTk6_8tty24LKrOnBFks5sCWUcgaJpZM4MLvTM .

-- Kir Titievsky | Product Manager | Google Cloud Pub/Sub https://cloud.google.com/pubsub/overview

aparod commented 7 years ago

I'd love to have first-class support for synchronous pulls in the Java library.

In lieu of that, we would need detailed samples and documentation on how to accomplish synchronous pulls at the gRPC layer, which was alluded to in Garrett's post on March 3.

kuroneko25 commented 7 years ago

A somewhat related question is how to do batch processing on pulled messages? Since most of the API level interaction is abstracted away in Subscriber and we can't do something like pull(numberOfMessages). The MessageReceiver interface only gives you a single message. The use case is we have downstream dependencies in our processing logic that we can't call on every message event (too inefficient).

evmin commented 7 years ago

With the Sydney region launched yesterday we are considering a new use case. Would be great if the documentation mentioned earlier (gRPC and GAX, synchronous pulls) was considered as part of the development effort.

Also to highlight the point I have missed earlier, yet correctly pointed out by @kuroneko25 - batch sync pulls. It is a part of the current API and an easy replacement with the new library would be required (again - it became incumbent).

garrettjonesgoogle commented 7 years ago

@evmin I have some development underway that should solve the synchronous pull use case. See https://github.com/googleapis/api-client-staging/pull/355/files#diff-daa84cceab804e5d45eff87c4ba348c2 for a preview. Basically I'm creating a new layer underneath the client (a "stub" layer) which will reflect the underlying API 1:1, and you'd need to code against that instead of the existing curated client layer.

evmin commented 7 years ago

@garrettjonesgoogle Thank you! Great news.

saturnism commented 7 years ago

quick ping @garrettjonesgoogle to see when the pull will make it back into the client library

garrettjonesgoogle commented 7 years ago

Current hope is by 7/21.

garrettjonesgoogle commented 7 years ago

I have made a release of google-cloud-java which contains the new stub layers which makes synchronous pull accessible. It is available by using https://github.com/GoogleCloudPlatform/google-cloud-java/blob/master/google-cloud-pubsub/src/main/java/com/google/cloud/pubsub/v1/stub/GrpcSubscriberStub.java . I can post an example for how to use it tomorrow, but I thought I would let everyone know it's there.

garrettjonesgoogle commented 7 years ago

Code sample:

    SubscriptionAdminSettings settings = SubscriptionAdminSettings.defaultBuilder().build();
    SubscriberStub subscriberStub = GrpcSubscriberStub.create(settings);

    SubscriptionName subscription = SubscriptionName.create("[PROJECT]", "[SUBSCRIPTION]");
    int maxMessages = 0;
    PullRequest request = PullRequest.newBuilder()
        .setSubscriptionWithSubscriptionName(subscription)
        .setMaxMessages(maxMessages)
        .build();
    PullResponse response = subscriberStub.pullCallable().call(request);

    System.out.println("Received " + response.getReceivedMessagesCount() + " messages.");

So I realized when typing up this code sample that it's a little bit awkward for the stub to use the settings class that has the renaming rule applied (i.e. Subscriber -> SubscriptionAdmin). Maybe I should do some more work to add a stub settings layer like so, so that the API methods hidden on the client class can also be hidden on the main settings class:

SubscriptionAdminSettings

SubscriberStubSettings

garrettjonesgoogle commented 7 years ago

I'm going to close this out now that synchronous pull has been exposed. The update to the settings design will be tracked internally.