rebus-org / Rebus.GoogleCloudPubSub

:bus: Google Cloud Pub Sub transport for Rebus (under development)
Other
5 stars 3 forks source link

Pubsub lease management #8

Open asleire opened 6 months ago

asleire commented 6 months ago

First issue!

The problem: In the current implementation of GoogleCloudPubSubTransport, when a GCP message is pulled it must be processed within a certain "Acknowledgement deadline" which is configured on the PubSub Subscription. With a fixed deadline we need to configure it high enough so that we can process all kinds of messages within the deadline, but we also need it low enough so that when an error does occur the message is retried within a reasonable time. It can be very difficult to set a deadline that fits both criteria.

The solution: Lease management lets us set a low initial deadline and continuously extend it throughout our event processing. This gives us the best of both worlds: a message can be locked for as long as processing takes, but if processing stops for some reason the deadline will soon pass and the message can be picked up by another worker.

The implementation: The AzureServiceBus implementation has something similar which we can probably base it on.

One problem is that in PubSub, when pulling a message, we don't know what the ack deadline is. This is different from AzureServiceBus which returns the lock's expiration time. I think a good solution is to have a configuration property for the interval to extend the deadline and simply recommend through documentation that it be a reasonably low number and must at least be a few seconds lower than the deadline configured on the PubSub subscription. We will then extend the deadline every interval starting from when the message is received

I can take a shot at implementing this @mookid8000, but I'm curious if you have any input

mookid8000 commented 6 months ago

My initial thought was also to build a mechanism similar to Rebus' ASB transport's automatic lease renewal, only with a configurable renewal interval.

But is there really no way for a client to query Google PubSub for the relevant lease duration?

Regardless, having the mechanism in place with a configurable interval should make for a solid foundation, in case it turned out later on that the lease duration could be discovered dynamically somehow.

asleire commented 6 months ago

But is there really no way for a client to query Google PubSub for the relevant lease duration?

Unfortunately it is not part of the response when pulling a single message. Interestingly it is possible to override the ack deadline when using StreamingPullRequest, but that's not suitable for us since Rebus needs to control the rate of processing.

The only option I see is getting the subscription to see its default deadline, but doing so requires an IAM role subscriber clients would normally not have

jstafford5380 commented 2 days ago

Unfortunately it is not part of the response when pulling a single message. Interestingly it is possible to override the ack deadline when using StreamingPullRequest, but that's not suitable for us since Rebus needs to control the rate of processing.

With streams, you can set the max unacked message count separately of the max subscriber/worker count. In essence, it should stop pumping messages until space opens up, like a semaphore. Shouldn't you be able to leverage that to achieve the same effect? I'd imagine an in-memory queue/buffer that was slightly larger than what Rebus needs to control rate. Rebus would "receive" from the buffer and as they are acked, the buffer would have additional messages pushed to it.

A singleton semaphore could be used to control the messages being queued up, but I can't quite figure out is how this would know when to ack the message.