sirixdb / sirix

SirixDB is an an embeddable, bitemporal, append-only database system and event store, storing immutable lightweight snapshots. It keeps the full history of each resource. Every commit stores a space-efficient snapshot through structural sharing. It is log-structured and never overwrites data. SirixDB uses a novel page-level versioning approach.
https://sirix.io
BSD 3-Clause "New" or "Revised" License
1.09k stars 239 forks source link

Support for realtime web #537

Open JohannesLichtenberger opened 1 year ago

JohannesLichtenberger commented 1 year ago

Meaning in a first step we should stream the data changes for a resource to interested clients (we could just stream the JSON we generate, which stores a change track).

In a second step we can support queries, which only query one resource. Thus, naively we'd probably have to execute the query/queries asynchronous once a trx has been flushed to disk and check that only result nodes in the curre transaction intent log are streamed to interested clients. Maybe we can store the compiled query at least up until indexes are added/dropped.

Better ideas are of course welcome. A simpler much more efficient approach would be to check for updates on certain paths in the resource.

JohannesLichtenberger commented 1 year ago

How does RethinkDB implement this?

mosheduminer commented 1 year ago

I wonder if looking at https://github.com/pubkey/event-reduce would be helpful.

JohannesLichtenberger commented 1 year ago

Thanks Moshe, do yohh understand the basic algorithm they are using? Maybe it's too late currently, but I don't get how they determine if a new record from a change event is a query result or not.

mosheduminer commented 1 year ago

I don't know, I haven't really looked at the implementation. So I only know what it says in the readme.

ElenaSkep commented 2 weeks ago

Hello! I am interested in working on this issue. Would it be ok? I was thinking of using webSockets to stream the json. Also could you specify in which path I should focus first?

JohannesLichtenberger commented 2 weeks ago

You can start by streaming updates to subscribed clients: https://github.com/sirixdb/sirix/blob/982a346f6dbceeade7cf581199c47c910804f14d/bundles/sirix-core/src/main/java/io/sirix/access/trx/node/AbstractNodeTrxImpl.java#L297

Maybe we can add a post commit hook to listen for changes.

JohannesLichtenberger commented 2 weeks ago

So, I think it would be great if clients can subscribe to a database/resource update stream, thus that they receive what has been changed as a JSON change stream (we already write these changes to JSON files on disk).

So we need to have a simple pub/sub mechanism, checking for read-access rights and to write the changes to the websocket.

ElenaSkep commented 1 week ago

Hello again! @FayKounara and I have made some progress. We have created a pub/sub mechanism using Apache Kafka but it would be helpful if you could specify who is considered an interested client. Also we are using WebSockets and Nginx if its ok.

JohannesLichtenberger commented 1 week ago

@ElenaSkep oh wow, I think for our use case we should however keep it simple and use a non distributed pub/sub mechanism (maybe there' already a solution using Vert.x). Interested clients would for instance be a browser (IMHO it would be great to have a web based GUI which has views to either query or show the diffs between revisions). However, in the general case a Kafka based backend would be nice (but I think I'd implement a new storage solution in the io-package for that (analogous to the FileChannel or MMStorage)).

JohannesLichtenberger commented 1 week ago

So, I think a web client (for instance the TypeScript based client) would subscribe via a WebSocket to a database / resource in the database and subsequently it would receive all changes in the current JSON format. It may be used in the future by a new web frontend to display the differences.

JohannesLichtenberger commented 1 week ago

You'd probably use a simple thread safe blocking queue to handle the subscribers...

JohannesLichtenberger commented 1 week ago

So, in general it should be part of the sirix-rest-api bundle.

FayKounara commented 1 week ago

Hello! So now whenever there is a change in a database/resource we keep it in a topic (provided by apache kafka). What if we try to push this topic in a websocket? Is this something that would be valuable?

JohannesLichtenberger commented 1 week ago

To be fair I think it's too much overhead. I think it would be more valuable to provide another storage type for Kafka, to store the pages in Kafka instead of or asynchronous to storing in a local file in Kafka.

ElenaSkep commented 1 week ago

Sorry we got a little confused. Currently we have changed the serializeUpdateDiffs method and when there is a change in a db it saves it to a topic. So when someone runs Sirix the user will see the result of the query but also all the changes that have happened in this specific resource/database. Should we proceed and do something for this or look at something else?

JohannesLichtenberger commented 1 week ago

So, do you use the JSON format? I think instead of using Kafka it would be nice to have a new route in sirix-rest-api, where you can subscribe and receive changes via a WebSocket. I'd rather envision, that Kafka could store the whole resource as an alternative storage backend, what do you think? Sorry for the confusion, but I'm not sure if a Kafka change stream would probably also make sense.

In any case it would be nice to have both, a Vert.x based solution and maybe also the Kafka based.

FayKounara commented 1 week ago

Yes we use the json format. Okay we will look into what you suggested. Thank you for clarifying it!

ElenaSkep commented 6 days ago

Ok so we will make a pull request for what you asked with the Websockets directly listening for changes in the db. Since we have created an implementation with Apache Kafka as well should we open another issue for this enhancement?

JohannesLichtenberger commented 6 days ago

Let's see, you can of course make two PRs but we should never create a dependency on Kafka, as Sirix can also simply used as an embedded library in other JVM language based projects.

JohannesLichtenberger commented 6 days ago

And BTW: Thanks for all the work. Hope you'll also contribute in the future...