Open erkkikeranen opened 6 years ago
A thought on Kinesis;
From what I've seen, it at least started as fork of Apache Kafka pre-0.8. As is the nature of these massive scale distributed message logs (they are not Message Queues!) there's a lot of semantics that need to be done on the client side. As Kinesis' model follow Kafka's quite closely, it might be useful to take inspiration from Kafka's official consume API introduced in 0.9: https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/
It's a long read but worth it as it describes quite a few of the problems one is bound to hit with distributed message logs, especially the checkpointing and difference between at least once/at most once semantics. As rule of thumb, there's no "exactly once" in any distributed system, even when it looks like for the first 9 billion delivered messages/events :) Trust me, I've been on the wrong side of the fence and once you get that duplication of some critical event unexpectedly in production it is a paaaaaain to fix.
tl;dr:
Oh and in case you start diving into Kafka, here's a small dictionary to help you along the way:
Kinesis | Kafka | What it is |
---|---|---|
stream | topic | Shared name for a group of messages, a grouping of sorts |
shard | partition | Subpartitioning, essentially a number of allowed parallel reads per group. (see below) |
record | message | Single piece of data in the stream, eg. event. |
shard iterator | consumer group | A tag for consumer or group of consumers which tracks the reading status per partition to ensure minimal overlap in subsequent reads. Better name for this would be partition cursor. |
This direct mapping of semantics is why I'm about 99.9999% sure Kinesis is Kafka; the timing of its appearance back in the day also matches some general developments of the industry at the time too well for it to be just a coincidence.
Azure specific comment:
Also we need to check that when to use event grid, that could be a different feature and use case.
I don't think that we need do build a queue on top of event hub / event grid events, currently I think as a MVP it is enough we provide access for event trigger (and send event to hub and grid as next step)
It would be nice if a function could be triggered by events.
Comparing AWS and Azure, the abstracted services would be Kinesis and Event Hub.
Events are almost real-time telemetry that can be injected to multiple subscribers (i.e. functions for auditing logging etc).
hedge.edn:
connection-string is used in Azure for authentication and authorization to use the SPS stream.
handler:
Azure has "Event hub cardinality" setting that can be one or many (default), this should be inspected if there is a similarity or if this would be available in separate configuration if user wants to tune the deploying target.