Open kennsippell opened 9 months ago
Thanks to @mrjones-plip's prompting, I took a closer look at the feasibility of leveraging the existing create_user_for_contacts
transition as a sort of "maintenance mode" for a user (leveraging the user replace functionality). The TLDR is that it seems very possible to leverage this functionality to solve many of the most challenging aspects of the workflow described above!
The fundamental principal to this approach is that, to an end user, there is not much difference in putting their user in "maintenance mode" so they cannot login and then taking them back out again so they can login vs just disabling their original user and creating them a new one (besides the obvious of not being able to re-use credentials). The tricky part of both scenarios is making sure we don't lose data when initially logging the user out (and this is where the create_user_for_contacts
functionality comes in).
Just now I ran the following exercise (and the same should work on any >=4.1.0
CHT instance):
create_user_for_contacts
transition with documented configuration for app_url
, token_login
, and transitions
(no need to configure any replace_forms
). chw_a
) associated with a contact (contact_a
). chw_a
's device go offline so it is no longer syncing to the serverchw_a
's device add new contacts/reports that do not exist on the server.contact_a
's contact doc to have:
"user_for_contact": {
"replace": {
"chw_a": {
"status": "PENDING",
"replacement_contact_id": "contact_a"
}
}
},
chw_a
back online and sync
PENDING
to READY
, syncs that change to the contact, and logs out the user on the device.chw_a
(to a random value). This invalidates all existing sessions for that user and prevents any more data from being synced. Then Sentinel will create a new user (chw_b
) associated with whatever contact was set for replacement_contact_id
(in this case it would still just be contact_a
).Once the status on the server's copy of contact_a
changes from PENDING
to COMPLETE
, you can be confident that all of the user's data was synced and the user is now logged out. This would be the point where you could safely perform move-contact
operations that would affect the user. Once those operations are complete, you can provide the CHW with the credentials for the new user. When they login to the new user, they will do a fresh sync of data from the server.
Caveats:
replacement_contact_id
containing the new token login link. Really you want to send this line _once all your move-contact
operations are complete (so the user cannot inadvertently login too early). If you have no SMS Gateway configured, then the message will not actually be delivered. Another workaround would be to set a custom (dummy) value for the phone number on the replacement_contact_id
so the message would not be delivered to the CHW. chw_a
user and rehabilitate that user once all the move-contact
operations are done, there are a few challenges to this. There is no way (at this point) to prevent the creation of the new user (chw_b
), so extra users would be added one way or another). Also, if the CHW tries to log back into chw_a
on their original device (without uninstalling the app or clearing the data), the data on the device from chw_a
will still exist and a fresh sync will not be done. So, the safest approach is just switching to a new user. user_for_contact.replace...status = 'PENDING'
)_user_for_contact.replace...status = 'COMPLETE'
)_Obviously none of this is an ideal workflow and it does not address any of the problems at the heart of move-contacts
being so painful (looking to https://github.com/medic/cht-core/issues/6543 to maybe offer a glimmer of hope in that regard). But, it is functionality that already exists today in the CHT that I think could be leveraged to build a viable "maintenance mode" workflow.
@mrjones-plip please add any additional comments/questions that I have missed!
@jkuester - thanks so much for the deep dive on if my harebrained idea might work! I have nothing more to add.
@kennsippell - let me know if you'd like some help prototyping any of this!
Thanks guys. I'll check out this very interesting proposal.
Is your feature request related to a problem? Please describe. Loss of Health Data - Today, there is risk of data loss any time an user manager: 1) moves a user's area to a different spot in the hierarchy, 2) disables a user account, 3) replaces a user with another. You can see data-loss happening for live projects in issues like https://github.com/medic/config-pih/issues/719 where hundreds of denied replications are happening for this month-long period.
Burdensome Human Coordination - In our documentation for move-contacts, we require that "users must be encouraged to clear cache and resync!" to avoid this sort of dataloss. Users need to do this before the move-contacts command is executed and coordinating these sorts of activities with users/devs is very time consuming.
Also - when you run multiple move-contacts commands, you can take down a server like in https://github.com/medic/config-muso/issues/932 where the server was down for 12 days. This makes coordination even more difficult. For Uganda eCHIS where the entire nation is on one instance, how do you ensure that everybody who is moving contacts is talking to everybody else?
These programmatic steps required to do user management safely are becoming increasily difficult with scale. Without the availability of better tooling, project teams do not have time to coordinate these activities and have no option but to accept the risk of data loss.
Describe the solution you'd like We are creating automation to improve user management scenarios with cht-user-management. A noteworthy example on the roadmap, is a UI and cloud-based execution of move-contacts commands which aims to execute move-contact commands safely. https://github.com/medic/cht-user-management/issues/12
This issue tracks a request to create some sort of "maintenance mode" for user-accounts which will allow automation to perform operations on them without dataloss.
Something like: