Open dianabarsan opened 1 week ago
I've tried this over a local database with 100k docs, and these are the numbers my purge times ended up with:
CouchDb v. | Method | Time |
---|---|---|
v. 3.3.3 | _changes | 5.3 minutes |
v. 3.4.2 | _changes | 11 minutes |
v 3.4.2 | _all_docs | 18 minutes |
v. 3.4.2 | _changes with increased changes_doc_ids_optimization_threshold | 5.5 minutes |
So it turned out using _all_docs instead of changes requests is even worse than using the changes feed with the performance hit. The times depend on the dataset and how many doc ids get passed as payload to these requests, but I'm afraid that the increased time when using _all_docs is serious enough to disqualify it as a viable option.
So our only alternative is to update the changes_doc_ids_optimization_threshold
config to some significantly large value - we kinda limit the number of maximum docs we handle in a single purge request to ~20.000, so for safety I bumped it to 30.000 and keep current performance.
This means that no code changes are required, except for adding changes_doc_ids_optimization_threshold
as a couch config value.
Describe the performance issue CouchDb 3.4 introduces an "optimization" where the changes feed with doc_ids retrieves targeted docs only when the payload is under 1000 doc_ids, and goes over the whole changes feed when it's over 1000. Previously, there was no limit. This makes purging and other mechanisms that rely on querying changes with doc ids be very slow.
Describe the improvement you'd like Update purging so it hits other endpoints or work out a way to optimize it while still using the changes feed.
Measurements We should get similar purging times on Couch 3.3 and Couch 3.4.
Additional context
https://github.com/medic/cht-core/issues/9303#issuecomment-2473165284