medic / cht-user-management

GNU Affero General Public License v3.0
3 stars 1 forks source link

Cloud-Based execution of move-contacts commands #12

Closed kennsippell closed 1 month ago

kennsippell commented 7 months ago

Currently this tool outputs a cht console line which can be executed to move contacts. Let's take this to the next step and execute that command in the cloud.

kennsippell commented 6 months ago

Discussion here to also move households and not just CHP areas https://github.com/moh-kenya/config-echis-2.0/issues/1662 + using forms in the CHT to trigger the move (not a UI interface like cht-user-management).

kennsippell commented 4 months ago

Notes from design conversation today with @paulpascal.

What states will jobs go through:

  1. Pending
  2. (Not MVP) Flag user for maintenance
  3. (Not MVP) Wait for the user to sync
  4. (Not MVP) Check sentinel backlog and couch2pg backlog to ensure it is safe
  5. Execute the move-contacts + upload-docs
  6. (Not MVP) Remove user from maintenance mode
  7. Success
  8. Fail

What is needed from a Cloud-Based Queue

  1. Hostable in docker (rabbitMQ)? Or can it be cloud-based like SQS?
  2. Great if had a UI to view status, click to cancel, click to view logs, etc so we don’t have to build that
  3. We need to wait for users to sync... so we need to able to pop from queue many times, but decide not to process job for some days

Design questions:

  1. What should happen if a user never syncs? Should we eventually move them? What is a reasonable limit?
  2. What queuing technology should we use for this?
  3. How can a user cancel a job? Learn about the state of the job? Investigate a failed job?
  4. What parts of this should be built into CHT-Core's sentinel?

Next Steps:

  1. Evaluation of cloud-based queues against requirements above.
    1. Pick one.
    2. It’d be great to just write this down and kinda get some opinions. Maybe post it on #development or #cht-user-management
  2. Pick how things will get pushed into the queue?
    • Options
    • User management tool push into queue
    • UI in CHT
    • The biggest factor here is WHO should move contacts and reorganize hierarchies? Should they be online users? Who is it in #moh-togo/#moh-uganda/#moh_kenya?
  3. Make work thread(s) that pull from the queue and kick of cht-conf move-contacts code (via process, or via import, whatever)

-- End of MVP --

  1. Expose the queue’s UI somehow?
  2. Or build a UI to show status, cancel, etc.
  3. Moving contacts safely - Don’t take the server down
  4. Moving contacts safely - Don’t lose personal data
mrjones-plip commented 3 months ago

Any thoughts of doing this through a long lived task via CHT Core API and Sentinel? While it would mean needing to upgrade Core where you want to use the feature, there's already a system in place for long running tasks and queues - eg bulk upload. The user man. tool should also be able to query the status of the job as well.

kennsippell commented 3 months ago

@mrjones-plip Who can we talk with to learn about how this is implemented in sentinel and how these long running tasks/queues work today? I'm only familiar with our homebrew couch-based queuing system used for outbound push. Is this the same system?

mrjones-plip commented 3 months ago

@latin-panda @m5r and @njuguna-n did the original work on the CSV bulk upload according the PR I found!

cc @jkuester

kennsippell commented 3 months ago

Thanks. Quick scan and this PR doesn't appear to touch sentinel at all. Perhaps bulk uploads aren't processed by sentinel at all?

mrjones-plip commented 3 months ago

Oh! Well, that would be good to know if it wasn't in Sentinel - sorry if I've led you astray.

I dug up some test steps (private slack link) that I originally used to performance testing of bulk upload. Until some the engineers chime in on this ticket, maybe this will better expose how it works? From what I can tll there's a parseCsv() function which writes to the medic_log database. The medic_log entry is queried via AJAX to update the job progress on screen as its rows of the CSV are processed. When it's done a CSV of errors is available for download.

latin-panda commented 3 months ago

Yes, as @mrjones-plip explained, that bulk upload tool is a bunch of promises and waits until all are resolved - ongoing in the server (not a scheduler and not sentinel) - and writes the upload status in the medic_log (how many users are pending, failed, or successful). At the moment, I don't have more experience in Sentinel than what's documented, perhaps if this feature is expected to be heavy, it might need to create some sort of transition (entry point) then Sentinel will start listening for db changes and queueing those transitions to apply the changes

kennsippell commented 3 months ago

OK good to know! Thanks to both of you for the background info.

If we consider the long-term plan to move contacts without dataloss via something like https://github.com/medic/cht-core/issues/8860, then this will become a very async operation (multiple days). Therefore, I don't think the pattern used for bulk upload is right for this problem.

I think it's quite a bit more complex to build in sentinel. It would probably look a bit like how outbound push is written now; but I personally think we shouldn't be investing bespoke queuing technologies which clutter our already strained CouchDB. If a Core Dev has time to take this on, I do think it would be a more reusable approach across projects ... but I also think this is too much to ask of @paulpascal who has time to make progress on this now, and is doing great via a public reusable queue.

Thanks for the suggestion and please do let me know if you think we're missing anything, if our plan to move contacts via multi-day async queuing isn't correct, or if anything else here isn't in the best interest of users.

mrjones-plip commented 3 months ago

Thanks @kennsippell - that sounds like a fair assessment! I appreciate the consideration on what it would look like inside CHT Core vs outside.