medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
469 stars 218 forks source link

fix(#9601): prototype duplicate prevention #9609

Open ChinHairSaintClair opened 3 weeks ago

ChinHairSaintClair commented 3 weeks ago

Description

A prototype to prevent duplicate hierarchy contact siblings from being created as discussed with @jkuester and @mrjones-plip in a "technical working session". We expect a lot of feedback, changes, and further discussion before it's ready for approval.

To achieve this, we hook into duplicate detection strategies through "configuration", amend the promise fired on submit in the contacts-edit.component.ts file to run our additional check, and finally output the possible duplicate items to a duplicate_info section added to the enketo.component.html file.

Configuration:

_phdcChanges: { // Additional namespace
  // Specify your own contact_types here
  hierarchyDuplicatePrevention: Partial<{[key in 'person' | 'health_center']: Strategy;}>;
  // The Partial utility ensures that only the allowed keys (health_center, clinic, person, etc) are used, but none are mandatory.
};

Where the keys should match the contact_types listed in your base_settings.json file.

Currently two strategies are available, Levenshtein and NormalizedLevenshtein, with the ability to customize properties based on implementation needs.

Example implementation:

window._phdcChanges = {
  hierarchyDuplicatePrevention: {
    health_center: {
      ...Levenshtein,
      props: [
        {form_prop_path: `/data/health_center/name`, db_doc_ref: 'name'},
        {form_prop_path: '/data/health_center/external_id', db_doc_ref: 'external_id'}
      ],
      queryParams: {
        valuePaths: ['/data/health_center/is_user_flagged_duplicate', '/data/health_center/duplicate/action'],
        // eslint-disable-next-line eqeqeq
        query: (duplicate, action) => duplicate === 'yes' && action != null
      }
    }
  }
}

Where props are the definitions that should be used to evaluate how likely the current record is to its siblings. If no props value is provided, "name" will be used by default. E.g:

form_prop_path is the xml path to the interested value on the currently created/edited form. db_doc_ref would the property name of the sibling database document that the resolved value of the form_prop_path should be compared to. E.g: Suppose form has a structure of: <data><clinic><name>Test</name></clinic></data>. The path would be: /data/clinic/name. The "clinic" sibling documents would have a property of name.

Finally, we have a use case where we need to conditionally fire the duplicate check based on if our CHW has confirmed a record to be a duplicate, after our backend has flagged the item as such. We use the queryParams's valuePaths to specify the xml path of the question, or questions, that would allow us to mark a record as a duplicate, submit it, and then merge or delete it downstream based on the specified action.

Misc:

We use the CHT provided medic-client/contacts_by_parent view to query for siblings on form init. The request gets processed in the background giving larger queries the opportunity to make progress before the result gets awaited in the form submit.

@kennsippell, since we've touched on the duplicate topic before, it would be great to get your thoughts as this as well.

Issue

https://github.com/medic/cht-core/issues/9601

Code review checklist

License

The software is provided under AGPL-3.0. Contributions to this project are accepted under the same license.