micronaut-projects / micronaut-gcp

Integration between Micronaut and Google Cloud Platform (GCP)
Apache License 2.0
50 stars 35 forks source link

MN:4.3.x and google-cloud-pubsub:1.126.x: @OrderingKey is no longer honored in @PubSubClient #1060

Closed roar-skinderviken closed 6 months ago

roar-skinderviken commented 7 months ago

Expected Behavior

With a publisher like this

@PubSubClient
fun interface DemoPublisherWithMessageKey {

    @Topic("\${pubsub.topic}")
    fun send(
        message: SampleReturnMessage,
        @OrderingKey key: String
    )
}

I expect messages with same message key to arrive in order like it did with MN:3.2.x.

The issue seems to be related to google-cloud-pubsub:1.126.x, but it's still a breaking change that may cause severe issues in applications.

Actual Behaviour

With micronaut-parent:4.3.x that uses google-cloud-pubsub:1.126.x, messages are received out of order.

Steps To Reproduce

In the example application, comment out

<!-- MessageConcurrencyTest is failing with 1.126.x -->
<dependency>
    <groupId>com.google.cloud</groupId>
    <artifactId>google-cloud-pubsub</artifactId>
    <version>1.125.13</version>
</dependency>

in dependencyManagement in parent POM and observe that MessageConcurrencyTest is now failing when running mvn clean verify.

Environment Information

Ubunut 22.04 JDK 17 Kotlin 1.9.22

Example Application

https://github.com/roar-skinderviken/pubsub-emulator-demo

Version

4.3.3

jeremyg484 commented 6 months ago

We have updated the dependency to the latest which is 1.26.6. It looks like this was fixed in pubsub 1.26.4 - see https://github.com/googleapis/java-pubsub/issues/1889.

Note that I think your test might be making some incorrect assumptions. By default, we set the GCP subscriber to use Micronaut's scheduled task executor. The GCP library uses this to parallelize processing of incoming messages. I would not expect your RandomLatencyTestListener to consistently output the "end" messages in the same order unless you were to set a custom executor on the subscriber that is limited to using a single thread.

roar-skinderviken commented 6 months ago

Hi @jeremyg484 With google-cloud-pubsub:1.125.13, the test is passing every time. With 1.26.6, the test is not passing on my computer, hence I believe there is a change in behaviour.

Using a single-thread subscriber is not a viable solution in an architecture with high pubsub-message throughput.

jeremyg484 commented 6 months ago

Yes, I understand that using a single-threaded subscriber is not necessarily ideal (though that's not a hard rule - it could be done on a per-subscription basis and that subscriber could be reactive and able to still process messages quite efficiently even on a single thread). I'm just saying that's the only way I can see that your test would pass since it's one of the only ways of guaranteeing that the "end" message gets printed in order.

The guarantee that the google library makes (see https://cloud.google.com/pubsub/docs/ordering#considerations_when_using_ordered_messaging) is that the acknowledgements will be sent back to the service in order, even if differing latencies in processing causes them to complete out of order. With the above code reversal in the library, it seems like they are also ensuring that the messages are passed to a given subscriber in order.

I do suspect something else changed internally in google's library beyond the above reverted change, but I would have to do some deep digging into their code to confirm that. The fact that your test passed with the old version to me seems like a bug in that version of google's library. It would indicate there was a bottleneck, as it would mean that they were only passing one message at a time to a given subscriber and explicitly waiting for the message to complete before passing the next one. I can see how not all use cases would require such a constraint, thus the new behavior allows such cases to process at a higher throughput while still ensuring that acknowledgements are ordered.

roar-skinderviken commented 5 months ago

Hi @jeremyg484 After digging more into this, it turned out that adding .setEnableMessageOrdering(true) to the subscription when setting up the pubsub-emulator, solved the issue without changing number of threads.

With google-cloud-pubsub:1.25.x, enableMessageOrdering was ignored when creating a subscription using SubscriptionAdminClient. Tests in example repo passes with .setEnableMessageOrdering(false). Bottom-line is that tests in example app should have failed from the beginning because the subscription was set up incorrectly.

Some kind of warning/error in the logs when trying to publish a message with @OrderingKey without any listeners with enableMessageOrdering = true would be ideal, but maybe not that simple to implement.

Side-note Debugging the tests shows that messages are processed by different scheduled-executor threads.

I've updated the example app with the change.