Proposal: First-class support for message keys

davidsbond commented 3 years ago

Intro

This issue aims to make a case for including message keys as part of proximo's contract. Many event-streaming platforms support message keys, and it seems as though including them as part of proximo can solve some problems whilst not exposing underlying client APIs.

The problem

I've noticed (at least internally at @utilitywarehouse), at least 2 teams that have had to roll their own proximo server implementations in order to cater for message key. This is usually done by using substrate's KeyFunc mechanism for publishing to Kafka. The implementations I see for this can be quite complicated or bring problems when deploying proximo.

The partner team has created a protoc plugin that uses options to generate methods to return the desired message key, muddying proto definitions & requiring a custom proximo implementation to obtain said keys.
The billing team has a type switch checking event types to determine which proto field to use as the desired message key (this one is particularly problematic, as it couples proximo to the actual proto event types, which is undesireable as it requires ensuring the proximo implementation contains all possible proto messages linked in the binary)
On customer-billing, we use proximo purely for consumption and instead write to kafka directly so we can have message keys.

I suspect many more teams have done the same when it comes to their message keys.

Proximo states that exposing underlying stream implementations is a non-goal, which is fair. However, message keys are becoming more common across different event stream providers:

Kafka (partition keys)
Google PubSub (ordering keys)
RabbitMQ (routing keys)
Azure Service Bus (partition keys)

Proposed solution

1) Add a key field to the Message proto

message Message {
  bytes data = 1;
  string id = 2;
  bytes key = 3;
}

2) Update proximo backends to use the key if the backend supports it, in the case of kafka, we can specify the keyfunc. Currently, NATS is the outlier here, as it does not do message keys. However, in this case the backend can just ignore that key. If someone wants to switch from Kafka to NATS they already need to consider the fact that message keys will no longer exist, so it feels like it isn't a concern for proximo.

davidsbond commented 3 years ago

I've added a draft implementation in #93

seborama commented 2 years ago

To add to this issue, we also have a situation that needs the use of the message key. This is in the context of the legacy.account.events that we consume from the bespoke promixo server in front of the account / identity kafka q. We want to project the messages that we receive to a Kafka queue of our own for use by our domain consumers. Our requirement is to use log compaction on our topic (much like a&i do). For this to work "out of the box" we need to be able to retain the message key.

As pointed out in this issue's description, message keys are pivotal to several messaging systems. Without the flexibility to support message keys, it could be argued that as it stands, proximo is opinionated about what a message is. Paradoxically, this makes proximo more message-centric and less abstractive of the backend messaging system.

davidsbond commented 2 years ago

Since I no longer work at UW I don't really have the time to take a look into this anymore. If someone wants to adopt my draft implementation or just rewrite their own please do :)

uw-labs / proximo