networknt / light-eventuate-4j

An eventual consistency framework based on Event Sourcing and CQRS on top of light-4j and Kafka
Apache License 2.0
59 stars 20 forks source link

[question] How mature or industrial proven is this framework? #33

Closed archenroot closed 7 years ago

archenroot commented 7 years ago

Hi,

I am now in need of some kind of framework to help me with implementing event store with CQRS patterns.

My natural choice was Spring technologies, second I found 2 frameworks Eventuate and Axon.

If you have minute and experience Steve, could you elaborate in few sentences where you see strengths of your solution over these, performance is not big deal in our case. In my case I also do not pickup solution which is not yet adopted in some major industrial projects, we have 3-4 months to deliver the project, so I don't have much time for experiments.

stevehu commented 7 years ago

@archenroot Thanks for your interest on light-eventuate-4j which is a service mesh that provides infrastructure for service to service communication. On top of light-eventuate-4j, you can use REST, GraphQL and Hybrid style of APIs/services to communicate with consumers and use light-saga-4j(in development) for distributed transaction orchestration. If you have a single application, I wouldn't recommend to use it as the front infrastructure cost is too high. But if you are expecting to deploy hundreds or thousands services in the future, then it would be very suitable. Currently, we are working with a bank in Canada to implement Rest/Eventuate/Saga for its core business and several big companies (TCS, IBM, Redhat, Deloitte and Capco) are interested in working with us to support and promote the frameworks to other customers. The performance number is no big deal for small number of services but if you have too many services, then the production cost would be significantly different. In summary, our goal is enterprise level scalability, security and reliability as a cloud native platform. We are also exploring serverless offering for small and medium sized business at the moment.

GavinChenYan commented 7 years ago

It is easy to implement projects using the light-4j/light-evenutuate-4j framework. There is a sample project which could be used as a quick start reference. One of the client application which went to production only took a couple of days to implement and deploy.

Sample project Github source: https://github.com/networknt/light-example-4j/tree/master/eventuate/account-management

Sample project document: https://networknt.github.io/light-eventuate-4j/example/transfer/

archenroot commented 7 years ago

@stevehu - thanks for review. GraphQL sounds interesting, really. In my case I am trying to get rid of any kind of REST API where possible to make everything event driven async only, including UI (prefering just webscokets) Actually the initial deployment of the project will consists of about 10 services, but performance is really core focus and I have seen the benchmark where your framework shines against Spring technologies. I wonder if SpringBoot can be tuned to reach your performance with Undertow and some JVM tuning.

@chenyan71 - thank you for the links, handson work is worth thousands of, I will try it.

If you don't mind I have additional questions, if you have some hints on those in form of best practice.. really welcomed:

Event versioning How you handle event upgrading mechanism? Especially in large number of services I expect some kind of fully automated mechanism if possible.

Imagine simplified event store table (one event chain execution - no data, just example): id, eventname, version 1, ProcessCreated, 1 2, ProcessFinished, 1

Now I am going to change ProcessCreated event to version 2. So I will end up with: id, eventname, version, date 1, ProcessCreated, 1, 2, ProcessFinished, 1 3, ProcessCreated, 2 4, ProcessFinished, 1

Now, I don’t want to mess the source code with any kind of versioning of classes/methods, something like ProcessFinishedHandler_V2. Instead of I would prefer to recreate the obsolete versions of events: 1, ProcessCreated, 1, 2, ProcessFinished, 1 3, ProcessCreated, 2 4, ProcessFinished, 1 5, ProcessCreated, 2 6, ProcessFinished, 1

So now I have "duplicates" (not really) of events: Ultimately I would like to take out of the store the original stream with obsolete event version into archive: 1, ProcessCreated, 1, 2, ProcessFinished, 1

By leaving the store in upgraded history version: 3, ProcessCreated, 2 4, ProcessFinished, 1 5, ProcessCreated, 2 6, ProcessFinished, 1

I red lot of articles here, but I can't stand :-) some classes/method versioning in my code, imagine in extreme situation you have 10 versions, so maintenance can become overkill across the domain... as I see it

Event Log/Store What architecture in your banking domain do you use, you use RDBMS with tailing database log, or Kafka, or MongoDB... This can become single point of failure as it is supposed exist as CENTRAL point of TRUTH. There are multiple designs here. Kafka can do some things to support this out of the box (with some extensions required to do later). On the other hand it is like message broker with database backend. Eventuate do this with RDBMS where database log file is tailed for changes and push data into message broker (you need to write this database tailing log tool for your database).

Or on the other hand you can have message broker as is and implement Event Log as other service in ecosystem. Not sure what is the best, but interested on what do you prefer in banking domain.

Request - Response Patterns in event driven design You stated in some page https://networknt.github.io/light-eventuate-4j/architecture/comm-pattern/ that this can become issue. But in my case every event consists of metadata/header object where I support not only unique messageid by default, but correlationID as well. But with one important difference. CorrelationID can be more seen as processID/GlobalTransactionID. All these decoupling in uService design is about focus on small things independently, but in the end, you implement a process via event chain. So my correlationID represents this isolated micro process and is not only limited request-response you talk about, but could be anything to aggregate event chain which has business meaning by design. (validate file service, merge file service, publish file service, etc... so it is chain I would like to identify). So correlation ID in my case will always exist in my system, to me something like request-response doesn't exist anymore, there exists only event chain instance represented by events aggregated by correlatinId,eventchainId name it as you like.

Command Queary without HTTP(REST), just message broker I would like to completely limit anything like REST in my system, I will miss Swagger a lot :-), the thing is that I will end up transporting via message broker not only Events, but Aggregates. So I will possibly end up in something like translating HTTP methods (put, get, post, etc.) into the event itself. So I will need to define Commands and Queries as pure objects including transport.

Do you have experience on this field, or you simply handle this via REST and that's it? Any guidelines will help.

UI is not a layer, it is microservice OFFTOPIC This is little off-topic, but some architects still see UI as layer, to me it is just Some materials on this last stuff. See, the UI is usually seen as somthing on top of service layer, which I think is wrong :-) http://examples.sencha.com/extjs/6.5.1/examples https://flightjs.github.io/ https://github.com/wilk/ExtJS-WebSocket

I asked here many questions, I mean I do integrations for more than 10 years, so I am not novice in this field, but I worked on project which completely copied Netflix ways (we handled 10,000 requests per second in peaks) and it was amazing, but we did as well serious mistakes in design :-)). this project is much smaller I work on now, but I would like to make it beauty, so looking for the best ways on multiple places.

stevehu commented 7 years ago

@archenroot You have very in-depth knowledge on this topic from the questions you asked. Let me try to answer these questions and Gavin will add his points if anything is missing.

Command sourcing

It is doable if you go with events without anything like REST, GraphQL or RPC. In fact, we are using these synchronous request/response interaction only for UI as you cannot ask all customers to use events. It is called command sourcing if you invoke services with messaging and on the service side, the command will be transformed to one or more events as part of the event sourcing. Take a look at this package to see how we abstract message, event and command.

https://github.com/networknt/light-saga-4j/tree/master/saga-core/src/main/java/com/networknt/saga/core

Also, if you want to use websocket, here are two examples.

https://github.com/networknt/light-example-4j/tree/master/websocket

Spring Boot performance

Spring cannot be tuned to reach the same performance with light platform as it is Java EE based and blocking. Also there are too much overhead the framework introduced so we cannot even use it. We have to build our own in-house IoC service. We had some real numbers in comparison done in a bank and those numbers convinced them to choose light instead of spring boot. Basically, one light instance with 4CPU/300MB VM will need over 100 spring boot instances on 4CPU/2GB VMs to reach the same throughput and latency in some extreme use cases. That's several hundreds time more in production provisioning.

Event evolution

It is part of the service evolution but a little bit different. I was about the write something in the document site for this.

As you can see we use Kafka as message broker but not event store as events are immutable in Kafka. We use mysql database as event store so that we can always manipulate event if we want to in case the newer version of the event is not backward compatible to the older version. If event is backward compatible, then we can use Apache Avro schema in Kafka to ensure that our handlers can handle different version of the events.

Event Store

I prefer database as event store and here is an article to explain the reasons. Kafka is OK to be an event store but not perfect.

https://networknt.github.io/light-eventuate-4j/architecture/kafka-eventstore/

Event-Driven design

The event modeling is flexible in our case as most of the time, we are using JSON. The transactionId you are talking about is actually our sagaId. We just call each part different name but the way it works is very similar.

Command Side without REST

I've answered it in the command sourcing section.

UI layer

I am totally agree with you. To break a monolithic application to microservices, it should be cut vertically instead of horizontally. Microservices are about business functionality(bounded context) instead technical tiers. Even with REST, GraphQL, RPC, the UI can be composed with components in React and Angular.

It is a long post and I might miss some points. Please let me know if you have further question. I am looking forward to work with you and hope you could help us to improve the frameworks.

archenroot commented 7 years ago

@stevehu

Spring Boot performance benchmark results holly shit!!! :-)

Event evolution yes, so RDBMS is the story here as expected as well and Avro. The only difference is that instead of RDBMS I plan to give a chance to MongoDB. Actually I plan to use MongoDB for Query materializations within the service as local instances (CQRS). So would like to stay everywhere with same tech.

Event Store Totally agree, my personal findings found Kafka also not to be perfect fit.

One question here, is your Event Store developed as another microservice, so following other microservices general layouts, or you more look at that component as something else. Just point of view. I more tend to simply seeing it as again another uService.

Event-Driven design SagaID, I like it, I will also use it then, as it comes form application of Saga pattern...make better sense in event driven arch. CorrelationID/GlobalTransactionID comes from times of SOAP based heavy cannonical models from my ESB development times...

Thanks again for your answers.

Regards,

Ladislav

stevehu commented 7 years ago

@archenroot

We are using Mysql as event store only as we are relying on mysql BinLog to send the CDC (Data Change Capture) to Kafka. If we don't use BinLog, then we have to use 2PC to ensure that sending the message and writing into event store are in the same transaction. As you know 2PC is not scaling.

Other then event store in Mysql, we are using ArangoDB for portal and other query side databases as it is multi-mode(Document, Graph and KV). Of cause you can use MongoDB for query side. There is also potential to use MongoDB for event store as it has Log API that can be leverage. One of my clients had asked me to support Oracle as event store and I am looking at Gold Gate at the moment.

Light-eventuate-4j and light-saga-4j are actually infrastructure service that form a service mesh to support service to service communication.

archenroot commented 7 years ago

@stevehu

Nice. Now, imagine. You are in the even-driven infrastructure, and you use same message broker/protocol to service inter-commm. At same time you say that events can be consumed by multiple services. (I just repeat high-level fundamentals...)

This leads me to opposite/different design of event store to speed up the whole platform by possibly 30% ?: I will simply make Event Store just parallel observer capturing events and providing some infrastructure mechanisms, but not a BORDER in the event flow as it is used now.

Scenario I drop event into topic, where not only event store is subscribed, but also the service which process it. So I don't need to write bintail and CDC components on db level, I just listen to events relying on existing network of brokers. Is there a problem that one service stores event in history log and another is processing it, both independently in parallel? I am thinking of kind of race conditions possible or another possible evil coming from pandora box, but in general I don't see a problem as there are 2 situations which could happen:

  1. Service A produce Event to topic
  2. both Event Store and Service B consume this event which leads to:
    • a.) Event Store is faster with processing before Service B finishes -> this is happy scenario and doesn't represents anything bad - b.) Event store is slower or completely down, so Service B consumes and send COMMAND to persist Aggregate, but never receives event AggregatePersisted, so won't continue with generating new events. This is very important, so the system stop itself before making whole ecosystem inconsistent

I need really to think about this more as it doesn't create whole much new burden, but I have some security mechanisms in mind and the overall result is much faster performance as it looks like because: The flow is not Service A -> MB -> Event Store -> MB -> Service B but Service A -> MB parallel -> Service B -> Event Store

Such scenario will be definitely faster, don't you think? But ... :-)

Now, you might ask what will happen to the system if Event Store microservice has outage, Well, probably nothing by careful design:

So we have now event emited into service B, which is going to work in some method and there could happen:

  1. CQSR - QUERY will only work on local service -> you get not up-to-date data, not sure how to solve this now
  2. CQSR - QUERY implemented dynamically on Event Store (you can do this) won't work, no event AggrregateData message received -> This is ok, because it is like you stop the Universe to work in TIME
  3. CQRS - COMMAND won't work, no event AggregateUpdated received -> this is ok, because it is same as point 2
  4. No data manipulation, the event received is just in memory calculated (do 1 + 1) and new event is emited to be consumed by other systems -> This looks also good to me, because ultimately I expect to every event processing at one service method doing no-only-in-memory calcualtion, but hits the CQRS subsystem where it will stop as point 2 and 3

CQRS work here as natural border to prevent possible race conditions, etc. (which can be anyway solved by introucing versioning/sequence in events)

What do you think about this design? :-)))) It might need some brainstorming, but in the end, you reduce communication point in chain from 3 to 2!!! I think when you are interested in performance, this is something you might look into, sorry it is evening here I am laying in these architectures for whole day, so maybe I am missing some fundamental issue with this....

I count on minimal or no outages of event store service cluster, but if it fall down, error scenarios must be onsidered of course. I can also imagine, that there is Event Store monitoring service and if it will see it is down, it will generate Event Store Down event consumed by all services in large cluster which pause any kind of listeners and producers from operation :-) maybe not nice, but you prevent any kind of inconsistences at all, when it is up and running, monitor will generate EventStoreUp and all services continue -> You pause and play universe :-))

Either way, latency in inter-service communication will drop down significantly

Different technologies Regarding different technologies you use, yes, I can imagine that there are multiple use cases where you simply pickup the best one for your needs. I will also look at ArangoDB (never heard about it). In my case I would like to simplify, so looking for one all solver :-)

Oracle Oracle and Golden Gate will work just fine, I used it as well in the past for similar stuff, also you can look at LogMiner by Oracle.

Yes, I am just reading the service mesh document :-) thanks for pointing those 2 projects.

GavinChenYan commented 7 years ago

Firstly, yes, service produce events (command side) and service consume events (query side) are independently in parallel;

And the events process is asynchronous process which trigger by event which reached to event store. Basically, we define the event handlers in the services for certain type of aggregation for different type of purpose. And the event handlers will act as a listener to the event store. When the events come and match the aggregation type, event handler will be trigger and start to process the method. So it is doesn't matter if the event store faster or slower than the service process.

And for the CQSR, it depends on how do you design the project. For example, we have event handlers for report system which can save and generate local data store based on the populated event data for report query.

stevehu commented 7 years ago

@archenroot Event store is the source of the truth and I see you are writing to it at the last step and all service accesses are not in an XA transaction. If something happens before the event store is updated, your services and event store won't be in sync. This design will only work if no network failure occurs. I may miss something though. Also, writing into event store, for example Mysql is very fast as it is append only and the event publish to Kafka with all subscribers handling it in parallel. I think this is the second fastest compare with pushing the event to Kafka directly and use Kafka as event store.

archenroot commented 7 years ago

Thank you guys for comprehensive discussion, I will now move to do some hands on work with light framework.

I don't write to event store in last step I reduce 3 step flow to 2 step:

  1. event execution finished -> info is put into wild (service A)
  2. event is in parallel both persisted (service ES) and logically executed (service B)

Of course failure on infrastructure could introduce unexpected situation, but I have mechanisms how to handle those... but won't bother you

I am following study of physics, so to me there is no such thing like command or query in event-driven system at all, neither is anything like request-response. it is just flow of events while new events exists only as reaction on previous, no one commands or query anyone. This is the extreme application of event driven architecture applicable mostly in physics labs, so it will take time when we invent all these ideas into integration space.

Thanks again for your answers and especially for your time to discuss with me some concerns.

stevehu commented 7 years ago

@archenroot flow of events is exactly how the universe works. As developer, we always trying to abstract from the natural world and model it into classes or db tables. I myself is trying to avoid it so that is why I am working on the eventuate framework. As a framework provider, we have to provide a full range of solutions. However, I am very interested in the project you are working on as it is an extreme use case which would be a good test for the framework. Let's keep in touch and let us know if you need any help. Thanks for all the questions and ideas.

archenroot commented 7 years ago

@stevehu - really nice diskussion :-), I will post how I ended up. as I said, this time I am limited by time, but I will try to follow this universe idea replication and will post some pictures where I end up.