smallrye / smallrye-event-sourcing

Apache License 2.0
7 stars 4 forks source link
hacktoberfest

image:https://img.shields.io/github/license/smallrye/smallrye-event-sourcing.svg["License", link="http://www.apache.org/licenses/LICENSE-2.0"]

= SmallRye Event Sourcing

[[intro]] == Introduction

This is an ideas document that needs to be mined out into issues, some of the text is written as if it is a draft proposal for historical reasons but any proposal document will only come into existence in the MicroProfile sandbox.

Smallrye Event Sourcing allows for the connection of heterogeneous Microservices' state using MicroProfile Reactive Messaging as a conduit to transport state change events.

These events have at their core a state change in terms of user's domain objects but they can be generated and handled more easily than with traditional methods. The events contain generated metadata that allows them to be easily processed by receiving microservices to update state data or trigger application work.

Smallrye Event Sourcing can work in a 'plug-and-play' fashion with existing Change Data Capture (CDC) frameworks like Debezium, existing connector ecosystems like Apache Kafka Connectors and existing reactive streams sources and sinks via the MicroProfile Reactive Streams Operators specifications.

Transport is provided across any MicroProfile Reactive Messaging Connector which also provides the abstraction for reactive back-pressure and message acknowledgement.

Type converters operate in the same way as the MicroProfile Config specification and type converter code can be shared or repurposed.

However, Smallrye Event Sourcing is not an event sourcing framework. It simply provides very easy to use mechanisms for creating, distributing and handling state changes. It has no opinion on the data consistency or microservice architecture approach implemented. Its aim is to make it easy to implement the approach users choose and to provide the correct level of abstraction in order that issues of 'plumbing' fade into the background. In summary, think of Debezium with additional application layer integration for sources and sinks as MicroProfile reactive streams and able to use any MicroProfile reactive messaging Connector as a transport.

[[sources]]

[[value-add]] == Value Add of This Proposal

=== Reduce Microservice Development Cost

A big pain point of some microservice architectures is maintaining the chosen consistency model across services. Event sourcing is one common approach but event production, consumption and handling at-least-once semantics across failure is the same problem handled repeatedly with common pqatterns and adds little business value in itself. Raising the level of abstraction that this can be built on will save customers time and money and provide less scope for errors.

[[undo]] === Undo and Compensation

With an API that supports application involvement in change event handling; subscribable change event channels and events that include 'before' and 'after' state, it would be easy to invert a change event and construct an 'undo' operation. 'Undo' combined with transactional outboxes at an application level would be very useful constructs when building Eventual Consistency applications.

=== Leaning-in on Standardisation of Database Change Data Capture

There are no current standards for database change data capture support SPIs. This makes it more expensive to implement or re-implement change data capture solutions. It also provides less incentive for database vendors to provide and support CDC SPIs as the market for what can make use of them is fragmented. The value chain Smallrye Event Sourcing builds is likely to increase usage of CDC and provide incentive for database vendors to support it. A CDC SPI is a natural follow on aspect, that can be done in sync with Debezium and other libraries, and thus carry more incentive for it to be supported.

=== Better Integration with Other Microservice Concerns

The current CDC state of the art relies on users writing plug-in components for particular frameworks, such as Debezium Apache Kafka Connector schemas and type transformers. This work would have to be repeated in other contexts. The MicroProfile ecosystem can provide a set of sepcifications where artifacts such as type converters can be used in more than one area.

=== Library Selection in One Area Should Not Constrain Portability

There are certain things that Kafka is great for in a microservices context. The Kafka ecosystem is currently also doing a great job of expanding horizontally into connectors, streams, tables, queries and so on. The MicroProfile Reactive Messaging specification allows applications to be written that can be easily moved from one Connector (such as Kafka) to another (such as a cloud provider's proprietary equivalent) with no code changes.

Similarly Smallrye Event Sourcing allows for a cross vendor home for event sourcing concerns that exist today such as CDC, connectors, schemas, converters etc. It also provides an environment for technology that can be provided in the future such as transactional outbox connectors, conflict-free replicated data types, occasionally connected semantics and so on. If these technologies are developed solely for one transport (for example the Kafka ecosystem) then use of a function in one area will then constrain the solution in others, perhaps to one dominant library. This is a drag on portability, for example across database or cloud messaging providers. History has shown us that evolution plus choice in solution technologies is a benefit for users that ultimately lifts all libraries' performance.

=== Why is This Not a Dot Release on Reactive Messaging?

It would be possible to make a strong case for this just being one or more features added to MicroProfile reactive messaging. We probably do need features there to support some aspects (for example, the ability to nest Message envelopes and transforms in order to be able to cleanly separate concerns, and the ability to buffer messages in a way that interacts in a defined way with acknowledgements). However, until we shake out some solid patterns of what works well, 'event sourcing' for state distribution will be seen as too custom a set of patterns to be supported with standardised plumbing and so work would face resistance if done in the reactive messaging group.

'Event Sourcing' is a well recognised term and understood concept. As this work is not an Event Sourcing Framework or opinionated architectural approach. One can reasonably argue that it just makes event sourcing easier and is more like a set of tools to enable state change distribution across microservices. Unfortunately "Tools to Make State Change Capture, Distribution and Ingestion Easier Across Microservices" is a much harder rallying point for a common CDC SPI to supoprt and sounds like a much less attractive home to do a proof of concept for CRDT and so on. Hence 'Event Sourcing' has been chosen.

CDC is a whole skill set in itself and involves Database skills and contacts to push forwards a common SPI for CDC (which is a longer term aim of this work). These skills and contact exist already but not in the MicroProfile Reactive Messaging Team.

=== Why is This Not a Dot Release on the Debezium Project?

One implementation approach would be to wrap and replicate the change data capture facility of Debezium that peeks at database change logs. However, this proposal pushes deeply into annotation based application level integration for both state change event generation and handling.

To date Debezium has captured changes from 'underneath' the databases detached from application code. Being reactive stream based, this proposal will allow for applications to switch in any reactive streams based processing of event streams using MicroProfile reactive streams operators. The proposal will allow for a variety of reactive messaging connectors to be used to transport microservices' state changes including those with various characteristics such as reactive back-pressure, better legacy integration and so on.

Debezium could begin to benefit from any emerging reactive messaging ecosystem. For example there is a case for Debezium supporting Cloud Events https://issues.jboss.org/browse/DBZ-1292 but this work could perhaps better be done in the MicroProfile Reactive Messaging layer and so be of benefit to multiple users of reactive messaging.

Many write mirroring systems can operate remote confirmation at different stages - for example: acknowledge on local save, acknowledge on remote save and incremental chunking of updates (for example IBM's 'Metro-mirror', 'Global-mirror' and 'Change Volumes'). Frequently these schemes rely on having one vendor's systems on each end and no application involvement. Integrating CDC and reactive messaging allows this work to be done in a manner that can reused across technologies and support a polyglot microservice environment.

In short this is a 'big area' for customers, CDC technology such as Debezium is already proven to work and also has field-proven customer demand. However, a lot of the customer value remains on the table in terms of integration with other microservice concerns.

A 'common' standard that allows CDC SPIs and APIs to emerge; and the benefit of state change event propagation across diverse microservice platforms be realised have value that is not currently realised.

== Event Generation

State change events can easily be generated by applications, from the changes that have occurred in databases or from any reactive stream source. Additionally, other existing frameworks, such as Kafka Connectors can easily be wrapped to become event sources.

[[app-source]] === Application Sourced Events

State change messages can be generated from the application layer with an easy to use programming model that includes the mechanisms below.

[[transformer-method-shape]] ==== Annotated Methods: newObject = fn( oldObject)

MicroProfile Reactive messaging takes meaning from the 'shape' of annotated method. Transformer methods are methods that take in an object of a particular type, apply a function to it and return a new or transformed object of the same type. Such a method can easily be annotated to be a state transformation and generate a state change event containing the 'before' and 'after' state.

[[dual-writes]] ==== Application Dual Writes

Applications are at liberty to trigger the generation of state change events at any time using mechanisms similar to MicroProfile reactive messaging.

[[tailable]] ==== Tailable Cursors Some databases are providing support for 'tailable cursors' as a means to feed reactive streams. This provides an interesting potential hook for a custom CDC feed defined by the application that captures both an initial snapshot and subsequent changes within a defined set.

[[outbox-connectors]] ==== Outbox Reactive Messaging Connectors

A reactive messaging connector can easily take the role of an 'outbox', pending the delivery of state change messages on a subsequent event such as a one or two phase 'commit' instruction.

[[cdc]] === Database Change Data Capture (CDC)

State change events can be generated by plugging into Change Data Capture (CDC) support in databases to create reactive messaging Publishers. An example mechanism for this and first target of a proof of concept could be a Debezium based reactive messaging connector. This connector would implement the IncomingConnectorFactory interface that allows Debezium to act as a reactive streams Publisher of state change events.

[[outbox]] === The 'Outbox' Pattern

The outbox pattern allows for state change events to be explicitly created during application processing but these are held in an 'outbox' and only distributed if the processing, which may involve a number of actions, is deemed to be successful.

[[cdc-outbox]] ==== Database Outbox Table

Many databases allow observation of their change logs and state change events can be generated directly from these. This has the following benefits:

  1. There is no involvement of application code, it can even be retro-fitted to existing applications.
  2. It can be done under the transactional control of the database.
  3. It frequently performs efficiently

Typically, individual application tables are explicitly included in what changes are captured. Pre-existing tables in the application data model can be tailed but a common pattern is to have one or more additional tables, still under transaction control, for the purposed of holding state changes that the application wishes to 'broadcast' on a successful commit of changes to the domain model.

[[txn-mrm-connector]] ==== Transactional MicroProfile Messaging Connector

MicroProfile messaging connectors enable messages to be easily created by applications. Providing a connector that makes use of the underlying platform's transaction support allows for an easy to use 'store and forward' approach that avoids the problems of dual writes.

[[connectors]] == Event Transport

These can be fed through a reactive messaging outgoing Kafka connector for publishing to the appropriate Kafka topic or any other distribution mechanism that can act as an outgoing reactive messaging connector.

[[kafka]] === Apache Kafka

[[kafka-connectors]] ==== Apache Kafka Connectors

Debezium can make use of Kafka Connectors as an environment that is well suited to event sourcing. Clustering, sharding, offset management ensuring at-least-once delivery and schemas which allow for de-serialisation and type conversion are all made easier to build by Kafka connectors. Some of these concerns, for example clustering, as not within the scope of this area. However in some others Apache Kafka provides clear 'tail-lights' to follow either by wrapping function or providing for alternatives where this makes sense.

[[cloud-events]] === Cloud Events

There is an interesting integration possibility with Cloud Events. This is discussed from a Debezium perspective here: https://issues.jboss.org/browse/DBZ-1292 "wrap the existing Debezium messages into a CloudEvents envelope" If this work was done using MicroProfile reactive messaging it may be possible that it could be reused by multiple frameworks and customers rather than solely by Debezium.

[[sinks]] == Event Handling

On the remote end, an incoming reactive messaging Kafka connector is used to pick up the event change events from the appropriate topics in the Kafka server. These are fed onto a reactive streams processor that understand the change event meta-data wrapping added by the remote event sourcing message envelope.

=== The 'Inbox' Pattern

One can see an example of a hand crafted change event inbox here: https://github.com/debezium/debezium-examples/blob/master/outbox/shipment-service/src/main/java/io/debezium/examples/outbox/shipment/facade/KafkaEventConsumer.java

It would be interesting to explore how much of that could be automated with something like {@code @Incoming('orders') handleOrder(Message<ChangeEvent> event) } and whether that would be useful to users.

[[CRDT]] === Conflict-free Replication Data Types (CRDT)

This is an interesting value add on top of distributed state changes. The before+after+metadata structure of CDC change event messages would be an ideal abstraction on which to support easy to use CRDT semantics.

[[compensation]] === State Change Inversion and Undo

It would be possible to develop the concept of the inverse of a state change operation as a synthetic CDC event. This could be useful for building state change compensation operations that might contribute towards making eventual consistency applications easier to build.

[[ecosystem]] == Ecosystem

[[reactive]] === Reactive Streams Integration

[[back-pressure]] ==== Inter-service Reactive Back-Pressure

[[mp-config]] === Integration with MicroProfile Config Converters

[bootstrap]] == Initial Proof of Concept

Get Debezium example running and implement a MicroProfile IncomingConnectorFactory. Link this via a Processor to a KafkaOutgoingConnectorFactory. Subclass Message to do something useful with the before/after/meta data stealing semantics from the Debezium Kafka Sink Connector.

[[background]] == Background Reading

=== Event Sourcing Background

[[event-sourcing]] A good start to Event Sourcing is written at https://martinfowler.com/eaaDev/EventSourcing.html

[[mrm]] === MicroProfile Reactive Messaging Background

Much of the scaffolding that SmallRye Event Sourcing is built on is the MicroProfile Reactive Messaging specification. You can find an introduction to that specification here: https://github.com/eclipse/microprofile-reactive-messaging/blob/master/spec/src/main/asciidoc/architecture.asciidoc

[[newsgroup]] === Some Initial Newsgroup Discussions

This proposal emerged from many discussions and ideas and a lot of deep work from projects such as MicroProfile Reactive Messaging, Debezium and Apache Kafka. Some of the origins are in newsgroup conversations listed below. Feel free to add to any active threads.

https://groups.google.com/d/msg/microprofile/F0ehhd1MFMc/e2DLvf5tBAAJ