Materialised Views and a change stream off those materialised views.

gedw99 commented 3 years ago

I am using benthos and its awesome but i realised that a DB subsystem under benthos is what i really need.

This subsystem is a quasi DB that supports Materialised Views and a change stream off those materialised views. This does what i need https://github.com/MaterializeInc/materialize Its rust.

It works with sources and sinks, just like Benthos.

You can use different sources. https://materialize.com/docs/sql/create-source/

s3
Postgresql DB
files
etc

The sink of Materialize exposes a Postgresql DB, so you can use it as a Postgresql DB for queries of the readonly data. The really useful thing is that you also get a change feed off those Materialised views though.. https://materialize.com/docs/overview/api-components/#sinks

There might be some interest with integrating this into Benthos ?

Jeffail commented 3 years ago

Hey @gedw99, needs a bit of investigation, but it looks as though there's lots of different ways of integrating. What would your specific use case be?

gedw99 commented 3 years ago

@Jeffail yes its a big body of work.

It's also sort of going against the grain of how Benthos approaches things, but can also be complementary i think too.

Ok i try to justify this :)

Typical Use cases:

You have a DB and you have lots of projects using that DB, and you want to decouple the Write side from the Read side. Classic CQRS in order to not create schema evolution contention.
You don't have lots of projects but just want to create maintainable project.
There is always this fight at the Data level of Writes needing one schema and reads needing another. OLTP / OLAP, etc etc.
Event sourcing. The write side can just be events. The read side is just the VIews generated and the change stream on them.

This is why cockroach and tidb have CDC. It's also partly why GraphQL happened. SO that downstream systems can have the data in the schema they need and know when the data changes.

So with the proposed approach, you basically get that. Each Project taps into the Write DB ( could be anything like S3 or postgresql. ) and produces the Views and change feeds of those views that specific projects needs, and a change feed up into their middle tier and beyond.

by products of this:

Lowers the amount of complexity in the middle tier , because the stateful CDC subsystem manages all this state.
- The middle tier code is now stateless and so can be scaled out effortlessly.
Is language independent because its just SQL and some protobuf reflection.
Restarts are much faster.
- The Materialised Views are durable and are effectively replacing caches.
Serverless in that it allows Modules to be built that are not compiled but added at runtime and then reflected on.
- This is possible because the CDC subsystem is doing all the work, and the developers Protobuf in the Module describes the data and services.
Control and migrate using database SQL.
- By storing all Write, Read and Change data in the Subsystem which itself is a SQL DB, you can write standard data migrations, and so ease the burden of keeping it all up to date.
- The middle tier code refactoring effort is vastly reduced when the DB Schema changes.
automatically enable scale:
- you can do Master / Slave. The Master is the source write only DB. The Slaves are the Materialised Views and change feeds.
- you can have a master hot spare, in case your master falls over.

redpanda-data / connect

Materialised Views and a change stream off those materialised views. #750