medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
438 stars 209 forks source link

Add RapidPro as an SMS Gateway #6532

Closed MaxDiz closed 3 years ago

MaxDiz commented 4 years ago

The I-Tech project is moving forward with scale-up in Zimbabwe. The care protocol relies heavily on using two-way-text to communicate between patients and facility level providers. Using medic-gateway running on android to send and receive SMS using medic-api APIs has some issues including being dropped from the play store, the app going to sleep, low throughput, high latency, etc. And, the CHT's current sms aggregator integration with Africa's Talking does not cover Zimbabwe.

For this project, we need to choose an sms aggregator service provider. Infobip looks like the best option and has coverage in other countries that may be relevant for future projects. As an added benefit, it looks like RapidPro already supports Infobip (needs verified) if we want to consider using their service in the future. Infobip demo applications includes resources for integrating with their API's.

cc: @benkags @derickl

MaxDiz commented 4 years ago

@derickl can you confirm that Infobip is the sms aggregator service provider that we want to use for Zimbabwe?

derickl commented 4 years ago

Yes. Is the aggregator of choice. cc @benkags

garethbowen commented 4 years ago

@derickl @benkags I have a few followup questions about the implementation of this...

  1. Our SMS integration is currently limited by only allowing a single outgoing SMS aggregator to be defined. Can you confirm for this project that Infobip will be able to send SMS to all users irrespective of carriers?
  2. Similarly currently the SMS aggregator can only forward SMS to one CHT instance. If you need multiple instances (one per branch, a separate instance for training, etc) you will need a separate phone number and Infobip conifg. Will that work for you?
  3. Will you be transitioning to a short code? If so, will you keep the gateway running for people still using the old phone number, or shut it down?
  4. Will you use RapidPro or similar? It may be better to have CHT -> RapidPro -> Infobip, rather than CHT -> Infobip...
benkags commented 4 years ago

@garethbowen I do not have an answer to the first question but I have reached out to Infobip and as soon as I have a reply, will post here.

  1. For the 2 way texting(2WT) system in Zimbabwe, one instance would suffice.
  2. Yes. Clients respond when prompted by the system; it is therefore easy to make that transition even with an ongoing deployment.
  3. RapidPro has not been considered for the Zimbabwe context but you are right it may be worth bringing it up in those scale up discussions if and when they happen @SMurithi
benkags commented 4 years ago

Infobip got back and I had a call with a representative(Jeff). A follow up call with the technical team was suggested to get more info on integration including whether a single integration supports multiple carries out of the box and any other technical concerns we may have.

Here is some related useful information from that conversation:-

@garethbowen I ~was going to respond~ have responded to Jeff with question 1 above and requested for details on how to go about the integration and copied you. FYI, Jeff indicated that we may need to purchase a virtual long number for the technical teams on their end to facilitate an integration and I indicated what we would be looking to have a generic integration. This may be something to clear up some more with a technical person. Are there any specific queries other than 1 above that you would like to put forward to their technical team and hopefully get us a call or a more constructive direction?

garethbowen commented 4 years ago

Thanks @benkags !

The generic implementation would have a configurable API key issued by Infobip after the phone number (short or long) is set up. This is very similar to the AT integration so I expect it will work well.

RapidPro has not been considered for the Zimbabwe context but you are right it may be worth bringing it up in those scale up discussions if and when they happen

If RapidPro can work with Infobip then we can integrate with no additional Product development, so in many ways this would be preferable from our point of view. It also gives much more flexibility for messaging so it may be more future-proof. I think this would be worth investigating before developing a custom integration directly from the CHT.

Are there any specific queries other than 1 above that you would like to put forward to their technical team and hopefully get us a call or a more constructive direction?

No, that's all I can think of right now.

benkags commented 4 years ago

https://medic.slack.com/archives/CG28Q2Y9L/p1600180458026700

garethbowen commented 4 years ago

@benkags Can you confirm if Medic is going to be hosting this project?

benkags commented 4 years ago

It is not clear at this point. @SMurithi correct me if I am wrong but as I understand it, the partner is yet to give us actionable information.

SMurithi commented 4 years ago

Correct @benkags MM will continue to host on behalf of partner. I will advise otherwise if anything changes down the road

MaxDiz commented 3 years ago

From the project and eng team, it sounds like we can use the RapidPro integration with Infobip, but it requires some code modifications to deploy with the CHT. @garethbowen has details. Please consult with him before picking up.

MaxDiz commented 3 years ago

The planned SMS workflows for scale-up have the following cadence:

@garethbowen do you need any additional detail to inform the implementation pathway?

MaxDiz commented 3 years ago

Pulling in from slack conversation...

For implementation our choices are to:

  1. ~Write a bespoke SMS aggregator integration to Infobip (just like the Africa's Talking one)~
  2. Write a bespoke SMS aggregator integration to RapidPro
  3. Use RapidPro and Outbound Push and integrate RapidPro to Infobip (probably no Core dev required)

Option 2 seems like the best option since it would allow the CHT to treat RapidPro like a simple relay service so we can immediately support every SMS aggregator that RapidPro supports.

Option 3 is doable as a prototype (and hack), and requires logic in RP that we may be better off productizing in CHT to avoid difficulties in production/deployment

abbyad commented 3 years ago

After discussing with @garethbowen it seemed as though option 3 would be preferred since we can quickly spin it up, and using flows gives more flexibility for handling multiple gateways.

I am putting notes here after exploring that further, but the summary version is that webapp terminating messages can easily and reliably be handled with existing features, whereas webapp originating messages have some limitations that can delay when messages get sent, and also cause duplicates to be sent.

Webapp Terminating

Messages that are sent to the CHT can be handled by RapidPro with a simple flow that starts with a trigger for "messages not handled anywhere else".

The flow only needs to contain a webhook and error handling image

The headers, and body must be set:

@(json(object(
  "messages", array(
    object(
      "id", run.uuid,
      "from", replace(urns.tel,"tel:+", "+"),
      "content", results.message.value,
      "sms_sent", epoch(run.created_on),
      "sms_received", epoch(now())

    ),
  "updates", array()
  )
)))

Error handling could include retries, logging, and messages to the sender to let them know that their message was not processed.

Webapp Originating

Reusing the choices from the comment above, here are the ways that messages could be sent from the CHT to people:

  1. ~Write a bespoke SMS aggregator integration to Infobip~
  2. CHT-RapidPro messaging integration: the CHT would call the broadcast API to send messages, and poll it to check the status. Pros: no need to build or manage a RapidPro flow Cons: more involved feature in CHT yet less configurable and doesn't handle multiple RapidPro channels
  3. CHT-outbound push to RapidPro: A flow in RapidPro can be triggered when the state of CHT messages go to pending. Pros: handles multiple gateways and processing in RapidPro flows. Cons: difficult to reliably trigger flow, duplicates are possible without more work

The way the outbound push was prototyped was to trigger a flow when a message's state became pending. The flow would then call the SMS endpoint to get a list of all messages that need to be sent, process them in small batches to send them and report the status back to the CHT. This made it easier to catch and retry any messages that failed to send previously, but makes duplicates theoretically possible if the flow was triggered multiple times before it completes and they get the same set of messages. This may have changed with recent improvements to the SMS API.

Here is what the outbound push prototype config looked like:

  "outbound": {
    "textit-gateway": {
      "relevant_to": "doc.type === 'data_record' && doc.tasks && doc.tasks[0] && doc.tasks[0].state && doc.tasks[0].state === 'pending'",
      "destination": {
        "base_url": "https://textit.in",
        "auth": {
          "type": "header",
          "name": "Authorization",
          "value_key": "textit.in"
        },
        "path": "/api/v2/flow_starts.json"
      },
      "mapping": {
        "flow": {
          "expr": "'abcdef1234567890'"
        },
        "urns": {
          "expr": "[ 'tel:' + doc.tasks[0].messages[0].to ]",
          "optional": false
        }
      }
    },

Note that the above outbound push config has limitations in that it would only trigger for the first message in tasks. Also, SMS schedules, which are in scheduled_tasks, are not being considered.

This option of triggering a flow is still advantageous since it permits more flexible channel setups, but we need the following two improvements:

garethbowen commented 3 years ago

After discussion with @abbyad we've settled on option 2: writing an SMS aggregator in API to send and receive messages via RapidPro. This work should be started soon but should not block 3.11.0.

kennsippell commented 3 years ago

@binokaryg Can you clarify whether this issue is still a priority for I-Tech Zimbabwe? If I-Tech Zimbabwe is migrating to use outbound push RapidPro integration, then this feature does not add value for I-Tech Zimbabwe. Can you confirm that I-Tech Zimbabwe is indeed migrating to outbound push RapidPro integration?

@kitsao Can you clarify that MSF Goma does not have plans to migrate from outbound push integration with RapidPro to use CHT-SMS capabilities?

binokaryg commented 3 years ago

We have used outbound push to RapidPro in ITECH Aurum and planning to reuse the same method in the ITECH Zimbabwe scaleup. Proper integration with RapidPro with delivery status in the future would still be helpful.

abbyad commented 3 years ago

@binokaryg, where is the logic for the messaging flows? Which scenario is it:

  1. Using SMS workflows in the CHT, with outbound push to get the actual SMS to the network
  2. Using CHT to trigger a RapidPro flow that contains additional messaging/content logic
  3. Something else

Also, it would be helpful to see the relevant configs -- could you post a link or snippet of the outbound push?

abbyad commented 3 years ago

After digging into the config for the project mentioned, it appears that scenario 1 is used at times, where RapidPro is being used for outgoing messages from the CHT: "relevant_to": "doc.type === 'data_record' && doc.tasks && doc.tasks[0] && doc.tasks[0].state && doc.tasks[0].state === 'pending'", These outgoing messages could either be sent late, not at all, or multiple times. The status of these message will also be unknown to the person who sent them (eg the status is not know in the CHT app UI). Given that, I think we should still prioritize this issue highly and work with app developers of existing/upcoming deployments to make sure it meets their needs so that they would be able to use is when it is released..

dianabarsan commented 3 years ago

This is ready for AT on 6532-rapid-pro-sms-gateway. Documentation PR here: https://github.com/medic/cht-docs/pull/462

newtewt commented 3 years ago

I have tested this using their simulator. Using the example defined in the docs PR acts a basically a forwarding mechanism from SMS to CHT-Core and back. I was able to send messages and forms with the standard config. I received responses as well to my test phone.

I set up a flow and was able to generate a filled out form as well and get the responses.

Checking without token, with invalid token, results that aren't configured all are handled but I think there is a bit of change that could make it better.

When we save nothing because there is an issue our response code is 200. The flows in rapid-pro will acknowledge that as being successful. I think the response code needs to be 400 in this case. The failure state of a flow will not be triggered.

The logs showing the missing value that also returns a 200.

Apr 07 15:01:47 dev-gamma-b dev-gamma-b-medic-api-logs: (dev-gamma-b-58dd8f66f7-fmlbr) | [2021-04-07 19:01:47] 2021-04-07 19:01:47 WARN: Message missing required field "id": {} 
Apr 07 15:01:47 dev-gamma-b dev-gamma-b-medic-api-logs: (dev-gamma-b-58dd8f66f7-fmlbr) | [2021-04-07 19:01:47] RES 527276ef-1d8a-4e38-b9f5-a3d539f03a8a 34.236.102.117 - POST /api/v1/sms/radpidpro/incoming-messages HTTP/1.1 200 11 4.611 ms
dianabarsan commented 3 years ago

Thanks for the feedback. How would you prefer the error message to send back on error?

newtewt commented 3 years ago

That's a good question. I don't know what would be useful in our use cases. The failing flow means we could prompt back with a message saying X is invalid fix X. So at least an indication of why we didn't save anything so the configurer of the flow could respond to the messenger they need to provide the correct value.

dianabarsan commented 3 years ago

I've change the code to check if no messages were created ({ saved: 0 }) and return a 400 with a message when this happens. Given that more endpoints (africas-talking and gateway /sms) use the exact same function to create messages, I'm reluctant to change the way it works so late in the dev cycle, so it actually returns validation errors.

newtewt commented 3 years ago

I think the 400 is well enough at this point. An additional feature request can be logged detailing the needs for a failure state if that is not enough.

I think this is ready to merge. Sending and receiving is working well. Failure states are hitting the flows correctly.

dianabarsan commented 3 years ago

Thanks @newtewt .

Another part of the RapidPro workflow is outgoing messages getting correct states from RapidPro (it involves querying all messages that are in a non-final state that exist in the medic database), and also backing off from querying when RapidPro starts returning 429s (rate-limited).

Is that working as expected as well?

newtewt commented 3 years ago

I'm getting the responses from cht-core through rapidpro to my phone. Saw the message go through the different states, received by gateway, delivered.

EX: Thank you Contact for registering Patient. Their ID is 12345. If they are pregnant, please enroll in ANC with the P form.

How can I tell that I'm getting 429 vs just an issue with something? Logs?

dianabarsan commented 3 years ago

Yes, in the logs you should see a failed request to rapidpro. The error code should be 429 (and a message like "Request was throttled. Expected available in "something" seconds.") and you should see the error being logged once, and then 1 minute later the next "iteration" of polling should start.

newtewt commented 3 years ago

I think the rate limit is working as well. I hit the limit, registered a new person, eventually when my limit was over I got a response about my patient being registered.

dianabarsan commented 3 years ago

Thanks for the update @newtewt !

dianabarsan commented 3 years ago

Merged to master.