jezzsantos / saastack

A comprehensive codebase template for starting your real-world, fully featured SaaS web products. On the .NET platform
The Unlicense
15 stars 5 forks source link

Reliability of Projections and Notifications #15

Closed jezzsantos closed 1 month ago

jezzsantos commented 7 months ago

As a first pass we've implemented a "synchronous" and "unreliable" mechanism to relay change events to read models and to notifications. It works "consistently", which is convenient, but it also naive.

See InProcessSynchronousProjectionRelay and InProcessSynchronousNotificationRelay.

These implementations (deliberately) do not model the intermediary message broker abstractions, that need to be represented in the architecture. They skip past that notional component entirely and do it in-process.

Furthermore, there is nothing fault-tolerant about them. If they fail to relay any event to any listener, they fail at that point, but the save of the events/snapshot has already succeeded.

This is really an acceptable outcome since half of the listeners may have processed the event and half have not. There is no replay capability and the push mechanism is not keeping track of progress for each listener. Also, for snapshot schemes, the events themselves are unrecoverable form memory!

Solution

These implementations need to be replaced (in a second pass) with more reliable and "asynchronous" versions of relays that involve an explicit message broker abstraction that when implemented, guarantees these things:

  1. They are consistent and reliable with updates to the aggregate, in that, we assume that the second part of the process (relaying change events) to a reliable mechanism (i.e., a queue or bus) could fail for any number of reasons after the first part of the process (updating aggregate state succeeds). This cannot happen.
  2. Change events are always "published" in order to downstream consumers and can never be received out of order by consumers.
  3. Change events must be cached and indexed by consumer, since each consumer may be at a different index at any one time.
  4. Downstream consumers must have to deal with replays and be idempotent.
  5. Downstream consumers must handle fault tolerance to keep up-to-date (pull vs push)

We have also to remember that there will be 2 implementations since the source of these change events could originate from either a snapshotting persistence store, or from an event-sourced persistence store.

Reference materials:

jezzsantos commented 3 months ago

Synchronous relay of domain events (IDomainEvent) is now acceptable within the same process as long as the process contains all subdomains of the same bounded context. If we ever split the bounded context into separate deployable pieces, then we will need to send integration events (IIntegrationEvent) that would then require asynchronous messaging and eventual consistency.

Thus, now we only need an integration event mechanism for communicating integration events across process boundaries.