redpanda-data / connect

Fancy stream processing made operationally mundane
https://docs.redpanda.com/redpanda-connect/about/
8.1k stars 824 forks source link

Materialised Views and a change stream off those materialised views. #750

Open gedw99 opened 3 years ago

gedw99 commented 3 years ago

I am using benthos and its awesome but i realised that a DB subsystem under benthos is what i really need.

This subsystem is a quasi DB that supports Materialised Views and a change stream off those materialised views. This does what i need https://github.com/MaterializeInc/materialize Its rust.

It works with sources and sinks, just like Benthos.

You can use different sources. https://materialize.com/docs/sql/create-source/

The sink of Materialize exposes a Postgresql DB, so you can use it as a Postgresql DB for queries of the readonly data. The really useful thing is that you also get a change feed off those Materialised views though.. https://materialize.com/docs/overview/api-components/#sinks

There might be some interest with integrating this into Benthos ?

Jeffail commented 3 years ago

Hey @gedw99, needs a bit of investigation, but it looks as though there's lots of different ways of integrating. What would your specific use case be?

gedw99 commented 3 years ago

@Jeffail yes its a big body of work.

It's also sort of going against the grain of how Benthos approaches things, but can also be complementary i think too.

Ok i try to justify this :)

Typical Use cases:

  1. You have a DB and you have lots of projects using that DB, and you want to decouple the Write side from the Read side. Classic CQRS in order to not create schema evolution contention.

  2. You don't have lots of projects but just want to create maintainable project.

  3. There is always this fight at the Data level of Writes needing one schema and reads needing another. OLTP / OLAP, etc etc.

  4. Event sourcing. The write side can just be events. The read side is just the VIews generated and the change stream on them.

This is why cockroach and tidb have CDC. It's also partly why GraphQL happened. SO that downstream systems can have the data in the schema they need and know when the data changes.

So with the proposed approach, you basically get that. Each Project taps into the Write DB ( could be anything like S3 or postgresql. ) and produces the Views and change feeds of those views that specific projects needs, and a change feed up into their middle tier and beyond.

by products of this: