Open m5r opened 1 year ago
(@m5r please correct this if it is wrong!)
A new cht-conf action, dry-run-purge-config
has been added. When you execute this action, it will call the new API endpoint with your current purge config and print the results. The results will indicate:
As noted in the initial cht-core PR, we tried to solve this by running the purging code minus the database mutations (aka dry run) but we ran into the same limits as actual purging with slow queries that made a dry run take hours to complete. Here is a copy of our test results:
I got some disappointing news about our purging dry run solution 😞
I've started a dry run of a purge in my morning on a clone of Muso-Mali with a beefy machine with similar specs: Xeon E5-2686 v4 @ 2.30GHz, 256 GB of RAM, ~650GB of data stored on a 1.5 TB disk. I'm using a fork of CHT 3.13.0 with the purging dry run API living on the temporary branch 3.13.0-FR-dry-run-purging.
It's the beginning of the night over here and the dry run is still going. It took nearly 5 hours to simulate purging contacts, processing ~10k records with each batched request. Our assumption was that queries were cheap and mutating the data was the expensive part of purging that makes the process so slow but it turns out the queries are expensive as we're seeing roughly the same performances as actual purging despite using couchdb views.
It averages 35% of CPU usage with spikes to 80% and any loss of connection between cht-conf and the API during the dry run results in wasted CPU usage as cht-conf can't reconnect to the API to wait for the results while the API keeps running the dry run.
With all this, it's safe to say we cannot move forward with this solution and we should go back to the design step for this feature.
Describe the issue
App developers can easily visualize and quantify the impact of a change to config for purging
Additional context Related allies OKR