ylorph / The-Inevitable-Event-Centric-Book

Some day, someone might write an authoritative book about this aspect. Let's call that inevitable book Event Centric as a placeholder title (this is a quote...)
71 stars 7 forks source link

Problem: Patterns for something something must be unique #28

Open ylorph opened 5 years ago

ylorph commented 5 years ago

Not talking about the Identity of an entity.
but some other combination of properties .
Let's say , user provided email must be unique within the system.
But may not be used as an identifier , because the email is allowed to be changed.

MerrionComputing commented 5 years ago

To add complexity there are actually two versions of this uniqueness constraint in an event sourcing based system: unique as of now, and unique over all time.

For some situations only one entity may use an unique identifier but that entity may "give up" the identifier and then it is available for reuse. For example, mobile phone numbers or employee badge numbers.

MerrionComputing commented 5 years ago

In both cases the set of all identifiers used needs to be kept as a cache or persisted read model, as running all the projections over all the streams to check uniqueness is slow.

But to do so introduces a concurrency crocodile risk - what if two events go for the same id at the same time?

ylorph commented 5 years ago

@MerrionComputing good points !

What do you think of : let's say you have a Stream type : "EmailMustBeUnique", constructed like "EmailMustBeUnique-[Hash_of_email] when a user provides an email with hash 12345

"Email must be unique when you provide it over all time" :

"Email must be unique when you provide as of now" : the same, but when a user changes email , you delete the old stream , this could be a reactive component on an event "EmailReleased" on stream " EmailMustBeUnique-12345"

The FailedToClaimEmail is handy to see what Email addresses are popular ;-)

MerrionComputing commented 5 years ago

That would definitely work so long as the entire operation is wrapped in a lock or transaction.

The most likely cause of non-uniqueness is the same command being picked up by two handlers so you need to make sure that can't lead to a duplication.

ylorph commented 5 years ago

the stream creation + "EmailClaimed" is atomic,, the the concurrency check on StreamDoesNotExists , enforces the uniqueness at creation time, accross all clients. (I'm assuming no sharding is in place)

duplicate command , would need to be handled by a inbox ? or one of the command would just fail, but is that an issue ?
or make the handler idempotent: add a check : "Did John already claimed Email-12345" if yes , do nothing

the change of email in the rest of the system , would be a reaction to "EmailClaimed" . Here we can get into issues if email XYZ is claimed, XYZ is released , XYZ is claimed again in a really short time span and the update did not happen yet everywhere

MerrionComputing commented 5 years ago

For sharding you need the email hash to be the shard key (partition key in azure tables)

That last part (if email XYZ is claimed, XYZ is released , XYZ is claimed again in a really short time span and the update did not happen yet everywhere) is an issue with any federated sources of truth model. Only one part of a system can have authority over "is this email already claimed".

ylorph commented 5 years ago

That last part is an issue with any federated sources of truth model.

What part ?

Only one part of a system can have authority over "is this email already claimed".

Yes, indeed

timove commented 5 years ago

The most likely cause of non-uniqueness is the same command being picked up by two handlers so you need to make sure that can't lead to a duplication.

Thinking about possible causes of duplicate command execution: This raises questions about the characteristics of the transport-mechanism. Is it a queue? Is it at most once, exactly once, at least once? At least once would require idempotent execution of the command. Exactly once would require a transaction spanning the command transport, command execution and event store.

Using a routing-strategy over an id might help ensuring, that only a single command handler would handle a single command -- as long as the set of command handler instances and thus the routing is stable. Changes to the network such as partitions ("split brain" situations) would require treatment.

timove commented 5 years ago

Uniqueness in general can be a question of scope. Given a username has to be unique for a tenant and the maximum number of users per tenant is small enough. If tenant is an aggregate (or can be added as an aggregate), tenant can hold a list of known usernames and guarantee uniqueness -- as long as command handling (for a given aggregate id) is serialized. (horizontal scaling can happen by utilizing a routing-strategy.)

ylorph commented 4 years ago

List of options :

http://codebetter.com/gregyoung/2010/08/12/eventual-consistency-and-set-validation/

johnbywater commented 4 years ago

Another option is atomically to commit two domain events: one domain event for the aggregate that has a value that must be unique (or otherwise constrained across the domain model), and one for an aggregate that represents a position in an index (or something) that can govern that constraint.

Then a behavioural or record conflict can be raised by the second aggregate, and unless both succeed the first aggregate's domain event won't be persisted.

For example the first aggregate has an editable "slug" attribute, by which the aggregate is referenced in an interface. And the slug can be converted to a UUID using UUIDv5 that is the ID for the second aggregate.

I actually published an example of this technique a couple of weeks ago, when I started writing an event sourced federated wiki, as an example of using the Python eventsourcing library: https://github.com/johnbywater/es-example-federatedwiki/tree/master/federatedwiki

I discussed this at some length with Ward Cunningham and friends, on the federated wiki channel.

Initial mention by me about this example, and discussion with Ward: https://riot.im/app/#/room/#fedwiki:matrix.org/$1581867044882787EyyDX:matrix.org

Subsequent discussion ("You may have felt cross-examined and not known why.") https://riot.im/app/#/room/#fedwiki:matrix.org/$15819507051032777LxInt:matrix.org

In case this design ruffles any feathers, so to speak, I'm well aware this design cuts sharply across the doctrine that one should only write to one aggregate sequence at once. This "only write to a single stream" rule might be a useful thing to observe in some or indeed many situations, but in this case, I just don't think that writing to two sequences atomically presents any kind of problem. Happy to have a reasoned discussion about this, because I might have missed something, but I'd prefer not to be told that it's just plain wrong (absent of reasons or objective concerns).

This example isn't fully developed, or even developed very far. But it could be extended in various ways, to implement various cases of implementing constraints across the model as a whole (a long recognised issue with this approach, something Martin Fowler raises in one of the videos about this topic online - I just can't remember which). Please see the tests for the little that has been done so far. :-)

ylorph commented 4 years ago

@johnbywater

This "only write to a single stream" rule might be a

That rule is more a guiding principle to avoid shooting oneself in the foot since most event storage system I'm aware of only support atomic writes to 1 stream.

johnbywater commented 4 years ago

@ylorph Thanks for the reply! Could we make a list of the event stores that you are aware of having this restriction? Maybe send me a DM on Twitter?