Implementing Event storage/search with a timeseries database or a lucene indexed database

ruma / homeserver

A Matrix homeserver written in Rust.

https://www.ruma.io/

1.08k stars 41 forks source link

Implementing Event storage/search with a timeseries database or a lucene indexed database #110

Open farodin91 opened 7 years ago

farodin91 commented 7 years ago

Possible Databases

Elasticsearch (lucene)
Influxdb (timeseries)
Cassandra (Clustered Database)
TiKV key Value

I would like to hear your ideas. For the start we could start, capsulated event handling a bit more.

sphinxc0re commented 7 years ago

Would then part of the event lookup move into ruma/ruma-events ?

farodin91 commented 7 years ago

Currently, I don't think to move it into ruma-events, because this repos are used to define structures these could use by client or servers.

sphinxc0re commented 7 years ago

Also, I learned from working with InfluxDB, that TimeSeries DBMS are working best if they are filled with data by a rate of 1Set/(5sec to 5min)

sphinxc0re commented 7 years ago

I don't think this is the case with these events so I find this a little overkill

mujx commented 7 years ago

What would be the benefit of this?

Seems like over engineering to me, at least at this point. Also adding an extra dependency will create problems with deployment. Synapse works fine without it.

farodin91 commented 7 years ago

I think, if we use Elasticsearch could reduce the complexity of sync massively. It could increase performance.

sphinxc0re commented 7 years ago

What about adding the possibility to choose whether the event processing should be done through ElasticSearch/redis/ on startup or through the config file?

jimmycuadra commented 7 years ago

When I was originally trying to decide on the primary data store for Ruma, I was strongly considering RethinkDB, as its concept of a client subscribing to a updates on a query seemed like a great fit for Matrix's /sync endpoint. Since then, the company behind RethinkDB has gone out of business, which is a real shame, but there is an effort to keep the project going by the community. I'm definitely supportive of the idea of using a data store that better fits the use case for Ruma. I would prioritize homeserver performance over operational/deployment complexity. We can worry about how to make deployment easy for layman users when we start writing docs about deployment. I'm more concerned with Ruma being able to support a homeserver with a huge number of users than I am about making it easy for a layman to deploy it in the simplest case.

skade commented 7 years ago

How would Elasticsearch reduce the complexity of sync? None of the mentioned products are particularly good at syncing.