Open dianabarsan opened 4 years ago
@ecsalomon thought you may be interested/have opinions in this issue
Please do not delete data from couch! We need something in place UPSERT data on Posrgres instead of using views, which are probably not sustainable much longer.
Seriously, unless we have a solution on the PG side to keep all of this data, this would make data science completely untenable.
We were not planning to delete data. We were planning to move docs from one database that syncs to postgres (medic) to another database that syncs to postgres (users meta), and make cht-core queries lighter.
To make this even clearer: this change should not affect analytics in any way (hence the moving). It should just affect cht-core performance. This data will still exist on couch, just in a different place.
Whew, ok! :)
We've researched cold-storage
recently by using a separate database and calling the _purge on the medic database.
This proved quite disruptive as:
The test was conducted on Couch 2, but there is no indication that _purge
has improved performance in Couch 3.
We should assess the _purge + view indexing performance in newer versions of CouchDb.
If there are no improvements, this could turn out to be a costly "cron" that we would run when we're confident the server is not expecting load soon - for example nightly during the weekend.
Describe the performance issue With saving tasks to disk and because tasks documents are replicated, an evergrowing number of tasks for every user will linerarly make _changes replication requests slower. Even if tasks are purged and don't cause performance issues on devices, they're still part of the server-side view queries and _changes requests. As time goes on and configurations change, we could end up with many tasks that have been in a terminal state for a long time and have no use in being replicated and wighing in the replication process.
Describe the improvement you'd like After (maybe 6 months or one year) since a specific task has been updated to a terminal state, we could move it to "cold storage" (for example
medic-users-meta
or a new database) and permanently (non-tombstone) delete if from medic db.Describe alternatives you've considered We could only move tasks that are Cancelled into "cold storage", not all that are in terminal state. There could be no "cold storage" at all.
Additional context It's hard to ascertain overtime task dynamic, since few projects have upgraded to 3.8+ and even those have done so fairly recently. We should evaluate this in a few months (or 1 years) time and use real world statistics to decide.