medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
468 stars 217 forks source link

Use Mango to reduce the number of indexes needed client side #5592

Open SCdF opened 5 years ago

SCdF commented 5 years ago

We currently query for data records in PouchDB / CouchDB via mapreduce queries.

Each map-reduce defined query creates its own index that needs to be kept up to date as data changes.

Mango splits the creation of the index from the use of it in a query. So, if we can convert many of our mapreduce queries to Mango, we can reduce how many indexes that need to be generated.

An example of looking at this for startup is here: https://github.com/medic/medic/pull/5264 (along with some other changes)

We need to be careful and measure performance, in both offline and online situations, both for query speed and individual index generation time, as well as network costs in the online scenario, and balance that against the fact that Mango allows for less indexes. It's complicated!

One core difference is that Mango queries are always equivalent to include_docs: true in relation to how CouchDB pulls data off disk into memory. One example of how this affects things is that a mapreduce view which doesn't include_docs will query much faster than the equivalent Mango query, when running in CouchDB (PouchDB should be equivalent).

The core goal is to reduce the count of indexes generated locally, while making sure we do not accidentally degrade performance elsewhere.

kennsippell commented 5 years ago

The impacts of convert a set of indexes to Mango were evaluated for feasibility and impact. In the first round of evaluations, the six MapReduce views which are warmed during bootstrap were evaluated.

Feasibility

MapReduce View Feasibility
medic-user/read We use this query with a reducer. Converting this to Mango has blocking bandwidth impacts for online users. This would be a good index to lazy load (#5859).
medic-client/contacts_by_type The sorting logic in this index cannot be reproduced in Mango. Users would need to fetch all documents and perform an in-memory sort. This has blocking bandwidth impacts for online users and poor (unmeasured) characteristics for offline users.
medic-client/data_records_by_type We use this query with a reducer. Converting this to Mango has blocking bandwidth impacts for online users. This would be a good index to lazy load (#5859).
medic-client/reports_by_validity I believe it is a bug that we build this index during startup (#5866)
medic-client/forms Feasible
medic-client/docs_by_id_lineage This MapReduce view allows for selecting a document based on the content of another document. This is not possible via Mango queries.

Performance Metrics for medic-client/forms

Execution Times - Measured via 100x tight loop

Index  Device MapReduce Mango Delta
'type' field only  Tecno F1 12,403 14,158 +1,755 (+14.1%)
'type' field only  Desktop 872 944 +72 (+8.3%)
'type' + '_attachments.xml'  Desktop 872 959 +83 (+9.5%)

Scripts:

(() =>{
  const start = performance.now();
  let chain = Promise.resolve();
  for (let i = 0; i < 100; i ++) {
    chain = chain.then(() => PouchDB('medic-user-ac1').query('medic-client/forms', { include_docs: true }));
  }
  chain.then(() => console.log('MapReduce Execution Time', performance.now() - start));
})();
(() =>{
  const start = performance.now();
  let chain = Promise.resolve();
  for (let i = 0; i < 100; i ++) {
    chain = chain.then(() => PouchDB('medic-user-ac1').find({ selector: { type: 'form', '_attachments.xml': { $exists: true }, }, }));
  }
  chain.then(() => console.log('Mango Execution Time', performance.now() - start));
})();

Build PouchDB Index - Based on sample of 3 measures

Device MapReduce Mango Delta
Tecno F1 13,895 18,022 +4127 (+29.7%)
Desktop 1124 2096 +972 (+86%)

Scripts:

(() =>{ 
  const start = performance.now();  
  PouchDB('medic-user-ac1').createIndex({ index: { fields: ['type'] } })
  .then(idx => { 
    console.log('Index', idx, performance.now() - start); 
    return PouchDB('medic-user-ac1').deleteIndex({ ddoc: idx.id, name: idx.name }); 
  }).then(console.log);
})(); 
(function() { 
  const start = performance.now(); 
  let chain = Promise.resolve(); 
  chain = chain.then(() => PouchDB('medic-user-ac1').query('medic-client/forms', { limit: 0 })); 
  chain.then(() => {
    console.log('MapReduce', performance.now() - start);
    window.indexedDB.deleteDatabase('_pouch_medic-user-ac1-mrview-bc4e9efc3baf76a2da15c82a700c0908');
  }); 
})(); 

Other Metrics

Metric MapReduce Mango Delta
Index disk use PouchDB 712 324 -388 (-54%)
Index heap use PouchDB 0 0 0
Bandwidth for online users CouchDB 11469 10058 -1411 (-12%)
Inbox.js script size with pouchdb-find 2,962,350 bytes 3,013,512 bytes +51.1 kB (+1.7%)

Measure IndexedDB Disk Use (Chrome 70 only): await window.navigator.storage.estimate()

Bandwidth Scripts:

curl http://admin:pass@localhost:5984/medic/_design/medic-client/_view/forms?include_docs=true -w '%{size_download}' 
curl http://admin:pass@127.0.0.1:5984/medic/_find -X POST -H 'Content-Type: application/json' --data '{"selector": {"type": "form", "_attachments.xml":{"$exists":true}}}' -w '%{size_download}'

Conclusions

Based on these findings, Mango is not particularly well suited to help with the WebApp's bootstrapping. It is likely that Mango is better suited for use outside of the webapp's performance hot paths. Some potentially fruitful options are to use it in API/Sentinel. Or within WebApp, the filtered search indexes should be investigated as well as less hot performance code paths like editing user settings (doc_by_type).

To help scope down 3.7, and because it is believed that IDBNext is likely to impact the performance characteristics of Mango and indexing - it was recommended that this investigation be continued after the IDBNext work.

SCdF commented 5 years ago

Chatted to Kenn. Things to try in the future:

garethbowen commented 5 years ago

Deferring to 3.9.0

garethbowen commented 4 years ago

Let's wait for IDBNext to land and then run this again.