jet / propulsion

.NET event stream projection and scheduling platform with CosmosDB, DynamoDB, EventStoreDB, MemoryStore, message-db, Equinox and Kafka integrations
https://github.com/jet/dotnet-templates
Apache License 2.0
177 stars 24 forks source link

Support EventStore Projection #8

Open Xatter opened 5 years ago

Xatter commented 5 years ago

For EventStore storage specifically, it looks like this library is designed so that EventStore is only used for storage of events and that subsequent communicating that events occured is offloaded to Kafka. The EventStore product does support messaging via PersistentSubscription and it would be nice for smaller projects to be able to use just EventStore and not have to spin up a Kafka cluster if that kind of scale is not needed.

Specifically I'd love an API where you can subscribe either to a specific aggregate user-1234 or a projection user-* and have the library handle things like checkpointing (storing the last read offset for a client in EventStore itself) similar to how PersistentSubscriptions work in the EventStore API today.

bartelink commented 5 years ago

Yes - using Kafka is definitely not a panacea and having consistent programming models between Equinox.Cosmos and Equinox.EventStore ia part of the idea; such stuff logically fits in an Equinox.EventStore.Projection library, which would match such facilities as presented in Equinox.Cosmos.Projection (which used the MS ChangeFeedProcessor library and stores state in an auxilliary bookmark collection).

As evidenced by this discussion

a) I personally am no expert in such ES facilities b) there are internal users within Jet that are interested in such a facility at a high level c) we're looking at building out just that

This would manifest as: a) dotnet new eqxprojector -e template which would produce a Projector project that tails an ES instance b) dotnet new eqxprojector -e -k template which would do the same as dotnet new eqxprojector -k (feeding to Kafka in the projector with one adding appropriate handling/filtering as desired) c) dotnet new eqxprojector -e -c template mode which would feed from ES to an Equinox.Cosmos store (thats what the thread is about and what I have shortlisted to build)

Now, to your specific use case:- a) but in general the idea would be to do a high level subscription tailing $all and the batteries included impl would be feeding to Kafka using the same components as used for dotnet new eqxprojector as it stands b) supporting a more constrained subscription would be an extra feature on top of a

So, in short, if you wanted to do a PR to add such a thing a) it could live in Equinox.EventStore.Projection with a sample adjacent to the Cosmos one in https://github.com/jet/dotnet-templates/tree/eqx-prj/equinox-projector-cosmos b) it might move about a little and be merged with the above ES -> Cosmos sync facility sitting alongside

In terms of this organically happening without input from your side, I'd venture: a) in general folks here tend to use Kafka as the consumer abstraction as that's all rigged and well maintained (i.e. nobody has had a similar ask internally atm) b) whether it makes sense to use a wrapper out of a library if you have ES in place and expertise around working with that API might be debatable

bartelink commented 5 years ago

There's super-early WIP wrt this in https://github.com/jet/dotnet-templates/pull/11

bartelink commented 5 years ago

Quick update: while there is work in train to provide wiring for projecting from EventStore, I should point out (esp upon re-reading the OP) that there are no current plans regarding storing consumer positions within EventStore itself coming from usage inside Jet.

I'll leave this open as I don't feel the need to rule it out, but in terms of adding logic into EventStore.fs to support it, my default stance would be to wait until the projecting to Kafka and/or bulk export side reaches some level of feature completeness.

Having said that... If you had a spike or more detail, I'd definitely be interested to hear more...

bartelink commented 5 years ago

have sliced and diced some PRs - https://github.com/jet/dotnet-templates/pull/16 shows the current WIP I alluded to above

bartelink commented 5 years ago

https://github.com/jet/dotnet-templates/pull/16 has had a lot of progress - it's about to sprout a way to store progress / offsets in Cosmos. Some of the progress writing will likely then be backported into equinox-projector in non-kafka mode. Between those two templates, you can treat Equinox.Cosmos + the CosmosDb ChangeFeed as a poor/rich man's Kafka (but consistent) projection system Right now it's looking like no part of the solution involves tracking offsets in EventStore - it can potentially be made pluggable though (which would e.g. facilitate storing state in Equinox.SqlStreamStore and/or mixing and matching in other ways)

bartelink commented 5 years ago

Looking like we'll build this - idea is to provide a dotnet-templates template which consumes from $all, and maintains offset in an Equinox.Cosmos instance using the Checkpoint aggregate in Propulsion.Cosmos, which is used for Sync from ES->Cosmos. Clearly this makes no sense for apps that don't already have a dependency on such a store.

With a bit of hacking, it should be possible to hook this to store it elsewhere. Interested to hear what sort of backing stores people have in mind (realistically, I personally won't be committing to actually doing the implementation)

CumpsD commented 4 years ago

This is an interesting issue. I'm using ES and want to project to SQL Server/Elasticsearch/.... by trailing $all.

In the past I've had this kind of functionality with SQL Stream Store and Projac to project events into SQL Server tables, but I didn't find anything similar for ES. Initially I started writing it myself, but if propulsion would support this, that would be awesome.

bartelink commented 4 years ago

It's looking like my work needs mean I'll be writing up a proper end to end set of docs for Propulsion that explain how its pipeline fits together soon (I edited and deleted forward looking statements here).

There are similar systems out there which have similar architectures for other platforms (the name escapes me) - it should be mentioned that the split of the concepts in Propulsion derives from scaling and perf tuning of large sync and projection pipelines with real data and needs (mainly mixing and matching ES and Cosmos) - i.e. this is not me trying to build some abstract projection framework.

Sources+checkpoints

the summaryProjector template has wiring to read: a) ES $all - with checkpoints saved in Equinox.Cosmos (writes an event every hour, updates a snapshot every 5s) Why? because in apps I'm working on, they always have an Equinox.Cosmos store to hand anyway b) CosmosDB CFP - checkpoints managed by CFP (updates the checkpoint documents every 5s but there is no record)

An easy enough to achieve extension would be to make an adjusted version of (a) that writes to an EventStore stream every 5s that has been configured to only maintain the most recent event (it's pretty pluggable, if you e.g. had something like Consul, you could write to that too)

this is the main open question that is critical to resolving this Issue as I see it

Schedulers / Handlers

I need to write docs on this

Propulsion Sinks

Propulsion has Sinks for ES and Cosmos. Sinks take event streams and replicate them - i.e. if you are migrating from ES->ES, ES->Cosmos, Cosmos->ES or Cosmos->Cosmos - the dotnet new proSync template demonstrates this. One could make a Propulsion.SqlStreamStore that would have the same facilities (syncing from/to ES/Cosmos/SqlStreamStore into or out of SqlStreamsStore)

this is an aside to your question, and the OP

ElasticSearch / SQL

TL;DR Its conceivable that one could Projac-ish helpers in a Propulsion.Sql module and a similar Propulsion.ElasticSearch helper - but that would not be a "sink" as such - from Propulsion's point of view, its really all just schedulers running handlers, and there's no real reason for general helpers that are not reading/writing/forwarding events to live in Propulsion.

I'll expand a bit in case that helps with conveying the thinking a bit:

Internally, we don't have any work in the near term writing to Elastic Search (don't get me wrong, we use it, just not that much in projections on the team I work on atm).

On Azure, for cases where you're targeting SQL but are not doing lots of indexing and/or searching/random queries, I'd actually give sinking to Equinox.Cosmos a try - dotnet new proTrackingConsumer and dotnet new proSummaryConsumer show the way - if you're projecting to whats effectively a document, that's actually very efficient from a cached reading perspective.

For Sql, we do (we have internal closed source helpers which are pretty generic - i.e. they might cover Projac-like usage, but are not as purposefully designed). I dont have plans/anticipate it crossing my radar to extract/polish such a facility, but I do know that lots of folks would use it. If you had a free standing set of helpers, a Propulsion.Sql module might not be crazy though - the first thing to make it do would be to port dotnet new proSummaryConsumer to it. Not sure if it would make that much sense to actually tie its releases and/or development to this repo though - ultimaltely Propulsion majors on:

bartelink commented 3 years ago

related: this (incomplete) PR provides for projecting from ESDB $all, storing checkpoints in ESDB

bartelink commented 3 years ago

related: https://github.com/absolutejam/Propulsion.EventStoreDB

bartelink commented 2 years ago

related: Propulsion.EventStoreDb in V4 implements the reader using gRPC. Leaving this issue open as there's no in-the box checkpointing that stores the checkpoints in ESDB (supported checkpoint stores are DynamoDB, CosmosDB, Postgres and SQL Server)