medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
439 stars 211 forks source link

"Maintenance Mode" for user accounts #8860

Open kennsippell opened 8 months ago

kennsippell commented 8 months ago

Is your feature request related to a problem? Please describe. Loss of Health Data - Today, there is risk of data loss any time an user manager: 1) moves a user's area to a different spot in the hierarchy, 2) disables a user account, 3) replaces a user with another. You can see data-loss happening for live projects in issues like https://github.com/medic/config-pih/issues/719 where hundreds of denied replications are happening for this month-long period.

Burdensome Human Coordination - In our documentation for move-contacts, we require that "users must be encouraged to clear cache and resync!" to avoid this sort of dataloss. Users need to do this before the move-contacts command is executed and coordinating these sorts of activities with users/devs is very time consuming.

Also - when you run multiple move-contacts commands, you can take down a server like in https://github.com/medic/config-muso/issues/932 where the server was down for 12 days. This makes coordination even more difficult. For Uganda eCHIS where the entire nation is on one instance, how do you ensure that everybody who is moving contacts is talking to everybody else?

These programmatic steps required to do user management safely are becoming increasily difficult with scale. Without the availability of better tooling, project teams do not have time to coordinate these activities and have no option but to accept the risk of data loss.

Describe the solution you'd like We are creating automation to improve user management scenarios with cht-user-management. A noteworthy example on the roadmap, is a UI and cloud-based execution of move-contacts commands which aims to execute move-contact commands safely. https://github.com/medic/cht-user-management/issues/12

This issue tracks a request to create some sort of "maintenance mode" for user-accounts which will allow automation to perform operations on them without dataloss.

Something like:

  1. Automation can set a flag on a user to "put into maintenance mode"
  2. Next time user syncs their data, the user is automatically logged out after the sync complete successfully
  3. All data is cleared from the user's device
  4. User cannot login, should see an error like "Your account is in maintenance mode"
  5. The user's account is flagged so automation knows the user has synced (account maintenance is now safe).
  6. In the example above, this is when move-contacts could be safely executed.
  7. Automation removes the flag keeping the account in maintenance mode
  8. User can now login. Maybe with their original credentials, maybe with a resent magic link, etc.
jkuester commented 8 months ago

Thanks to @mrjones-plip's prompting, I took a closer look at the feasibility of leveraging the existing create_user_for_contacts transition as a sort of "maintenance mode" for a user (leveraging the user replace functionality). The TLDR is that it seems very possible to leverage this functionality to solve many of the most challenging aspects of the workflow described above!

The fundamental principal to this approach is that, to an end user, there is not much difference in putting their user in "maintenance mode" so they cannot login and then taking them back out again so they can login vs just disabling their original user and creating them a new one (besides the obvious of not being able to re-use credentials). The tricky part of both scenarios is making sure we don't lose data when initially logging the user out (and this is where the create_user_for_contacts functionality comes in).

Just now I ran the following exercise (and the same should work on any >=4.1.0 CHT instance):

Once the status on the server's copy of contact_a changes from PENDING to COMPLETE, you can be confident that all of the user's data was synced and the user is now logged out. This would be the point where you could safely perform move-contact operations that would affect the user. Once those operations are complete, you can provide the CHW with the credentials for the new user. When they login to the new user, they will do a fresh sync of data from the server.

Caveats:


  1. ✅ Automation can set a flag on a user to "put into maintenance mode" _(set user_for_contact.replace...status = 'PENDING')_
  2. ✅ Next time user syncs their data, the user is automatically logged out after the sync complete successfully
  3. ☑ All data is cleared from the user's device (data is technically not cleared, but a new user would trigger a fresh sync)
  4. ☑ User cannot login, should see an error like "Your account is in maintenance mode" (Currently no nice messages, but user would be automatically logged out.)
  5. ✅ The user's account is flagged so automation knows the user has synced (account maintenance is now safe). _(Can watch for user_for_contact.replace...status = 'COMPLETE')_
  6. ✅ In the example above, this is when move-contacts could be safely executed.
  7. ☑ Automation removes the flag keeping the account in maintenance mode (Just switching to new user)
  8. ☑ User can now login. Maybe with their original credentials, maybe with a resent magic link, etc.

Obviously none of this is an ideal workflow and it does not address any of the problems at the heart of move-contacts being so painful (looking to https://github.com/medic/cht-core/issues/6543 to maybe offer a glimmer of hope in that regard). But, it is functionality that already exists today in the CHT that I think could be leveraged to build a viable "maintenance mode" workflow.

@mrjones-plip please add any additional comments/questions that I have missed!

mrjones-plip commented 8 months ago

@jkuester - thanks so much for the deep dive on if my harebrained idea might work! I have nothing more to add.

@kennsippell - let me know if you'd like some help prototyping any of this!

kennsippell commented 8 months ago

Thanks guys. I'll check out this very interesting proposal.