Open gabriel-vasile opened 3 years ago
Can you write up a proposal? Something along the lines of:
I was thinking about integrating something like Elasticsearch. Writing a search engine from scratch is not a PR, it's a full time job for a team.
In case of an external search provider I think the following is needed:
About the interaction between tinode and the search provider, there are two approaches to indexing:
For 1. there is the issue with existing messages. There needs to be a possibility to index all existing messages in case the index is lost, was just initialized, or any other reason. For 2., with Elastic at least, indexing is easily solved with a pipeline. Elastic supports mysql, mongo, and rethinkdb as data sources. Users need to provide a pipeline and Elastic will periodically query the database for new messages and index them. I'm not sure other search providers have this feature.
I think we should first decide if we are going to support more than one search engine and which one/s in particular. In my use case, supporting just Elastic is fine and it would make the implementation of this feature so much easier.
I would separate the concerns of starting a new service from scratch vs upgrading an existing service with message search.
I do see value of having Elastic or any other provider going to the DB directly. It also has drawbacks. For example, if we implement any sort of encryption at rest (a feature some people want) then the direct intake from the DB won't work.
I think we should first decide if we are going to support more than one search engine
I think there should be a choice. It does not need to be implemented immediately, a single provider is a good start. But there is value in an abstraction layer. Tinode is frequently used in organizations with an established infrastructure. If they use Solr or Algolia then it would be a harder decision if Tinode supports Elastic only.
I guess with 'encryption at rest' you mean end-to-end encryption and not just server-side encryption. If that's the case, then there is no other choice but to let the clients do the search. Sorry, but I think I'll have to drop working on this as I'm not really familiar with any of the client SDKs neither the languages.
This is a useful feature. No need to close even if you don't want to work on it.
I meant what I said: encryption at rest.
I meant what I said: encryption at rest.
What you said is not clear enough. You can have end-to-end encryption (clients have the encrypt/decrypt keys) or server-side encryption (the server has the encrypt/decrypt key). In both cases the data is encrypted "at rest". But one has access to the plain, unencrypted data on the server and allows you to search through it, the other doesn't.
What about to use Full Text Search from Databases? With end-to-end encryption the search must be done in the client side
What about to use Full Text Search from Databases?
Rethinkdb does not have it at all. Mongo has no support for CJK - it can't split words. FTS in all three databases is mostly useless for heavily inflected languages.
So, it can be done for English with MySQL and maybe with Mongo but it will suck.
Elastic or sphinx or solr is not a bad idea.
Are there any planned release dates for the full text search and encryption in rest features? They are showed here in the planned section.
No. @ice-myles are you willing to help?
I guess there is no ETA for this feature and no plans to ever implement it, but I'm willing to contribute the server code for it. Please tell me how you think this should be done with some references to the code and I'll open a pr.