Open quantranhong1999 opened 9 months ago
Hmm... I'm wondering if it's really necessary to go that far. I would think the moment you have potentially a lot of pressure is the first time you run the task. After when a good part of your messages are archived already, the pressure would not be so much?
Hmm... I'm wondering if it's really necessary to go that far. I would think the moment you have potentially a lot of pressure is the first time you run the task. After when a good part of your messages are archived already, the pressure would not be so much?
I agree. But still a potential improvement IMO, I record the idea otherwise one day I forget it. Open for discussion ^^. Not priority though for sure.
Why
Today's behavior: iterating all user INBOX messages using Cassandra which put much pressure on Cassandra.
Using OpenSearch to query the date could avoid iterating all the messages which could bring faster response time (in most of the cases?).
Following Benoit's concern: OpenSearch may not be good for searching a big INBOX which searchs through a lot of shards, and OpenSearch is not a source of truth.
My proposal:
Solution 1: Mix of OpenSearch and Cassandra Rely on OpenSearch on average and small INBOX size e.g. < 100k messages, for big INBOX use Cassandra to not blow up OpenSearch. If OpenSearch is down or query timeout, fallback to Cassandra -> resilient upon OpenSearch failure.
Solution 2: OpenSearch first Rely firstly on OpenSearch for all INBOXes, only fallback to Cassandra upon OpenSearch failure. TODO benchmark in unit test/preprod to see if OpenSearch can handle all the pressure.
I feel that solution 1 could be a safer solution while still bringing improvement in task speed.
DoD