siilisolutions / hedge

a serverless solution for clojure
Eclipse Public License 1.0
39 stars 6 forks source link

Feature Request : event streams triggers and outputs #81

Open erkkikeranen opened 6 years ago

erkkikeranen commented 6 years ago

It would be nice if a function could be triggered by events.

Comparing AWS and Azure, the abstracted services would be Kinesis and Event Hub.

Events are almost real-time telemetry that can be injected to multiple subscribers (i.e. functions for auditing logging etc).

hedge.edn:

{:event {:handler name
         :stream (event hub name in azure, stream in aws)
         :consumer-group (consumer group in azure, application in aws)
         :connection connection-string}}

connection-string is used in Azure for authentication and authorization to use the SPS stream.

handler:

(defn handler [event]
    (info "Event received: " event))

Azure has "Event hub cardinality" setting that can be one or many (default), this should be inspected if there is a similarity or if this would be available in separate configuration if user wants to tune the deploying target.

esuomi commented 6 years ago

A thought on Kinesis;

From what I've seen, it at least started as fork of Apache Kafka pre-0.8. As is the nature of these massive scale distributed message logs (they are not Message Queues!) there's a lot of semantics that need to be done on the client side. As Kinesis' model follow Kafka's quite closely, it might be useful to take inspiration from Kafka's official consume API introduced in 0.9: https://www.confluent.io/blog/tutorial-getting-started-with-the-new-apache-kafka-0-9-consumer-client/

It's a long read but worth it as it describes quite a few of the problems one is bound to hit with distributed message logs, especially the checkpointing and difference between at least once/at most once semantics. As rule of thumb, there's no "exactly once" in any distributed system, even when it looks like for the first 9 billion delivered messages/events :) Trust me, I've been on the wrong side of the fence and once you get that duplication of some critical event unexpectedly in production it is a paaaaaain to fix.

tl;dr:

Oh and in case you start diving into Kafka, here's a small dictionary to help you along the way:

Kinesis Kafka What it is
stream topic Shared name for a group of messages, a grouping of sorts
shard partition Subpartitioning, essentially a number of allowed parallel reads per group. (see below)
record message Single piece of data in the stream, eg. event.
shard iterator consumer group A tag for consumer or group of consumers which tracks the reading status per partition to ensure minimal overlap in subsequent reads. Better name for this would be partition cursor.

This direct mapping of semantics is why I'm about 99.9999% sure Kinesis is Kafka; the timing of its appearance back in the day also matches some general developments of the industry at the time too well for it to be just a coincidence.

erkkikeranen commented 6 years ago

Azure specific comment:

Also we need to check that when to use event grid, that could be a different feature and use case.

I don't think that we need do build a queue on top of event hub / event grid events, currently I think as a MVP it is enough we provide access for event trigger (and send event to hub and grid as next step)