Closed abbyad closed 4 years ago
Note on eng feasibility:
sentinel/schedules
, whereas the logic for the transition that would need to call that logic is in a shared lib. So some refactoring would need to be done to make that cleanI just wanted to confirm that we have at least 4 funded implementations that would benefit from or require this:
With this set of projects in place, we have time and resources to get this right, yet at the same time the covid response is time sensitive. If there were a quicker way to bring the turnaround down to a minute or two, then we could buy ourselves more time (months at least) for a more instantaneous chat experience.
^^ @ranjuts - I don't think this was included in the last prioritization round. May want to bump up the priority list
We have been using the scheduler set to 1 minute in a test instance for the past couple of weeks, and it works well for outbound push. @SCdF what are the secondary effects of doing this? Since this relates to schedules outside of outbound push, what are the potential downsides or tradeoffs to consider?
@abbyad this means that other schedules will run more frequently. This in theory shouldn't matter much, except that there may be more load on the server.
Thanks, @SCdF, what are the other schedules and how might we can we get a sense of the additional load? Would you recommend just making outbound push as immediate as possible, and leaving the other schedules as they are?
I think if we get outbound sending immediately I think setting the scheduler back to 5 minutes is reasonable? The only reason it will fail to send immediately is if:
I agree. The ultimate solution is to run this as a schedule and a transition. This way the first attempt is made immediately and if that fails it'll be retried every 5 minutes. I don't think this is particularly challenging, it just means repackaging the code so it can be called by both.
Cool, that makes a lot of sense. Will it be configurable whether or not to use the transition at all? We're generally going to want that for use cases that involve back and forth messaging, but I'd imagine we'd only want the scheduler for use cases that involve sharing of records, referrals etc.
I'd imagine we'd only want the scheduler for use cases that involve sharing of records, referrals etc.
What is the benefit of waiting up to 5 minutes to share records and referrals rather than attempting to share immediately?
I had in mind @SCdF's comment from earlier in the thread:
this means that other schedules will run more frequently. This in theory shouldn't matter much, except that there may be more load on the server.
If we're not worried about performance or cost implications of more load on the server, then I wouldn't have any reason to care.
I think Stefan's comment was regarding Marc's workaround about setting the schedule period to 1 minute. Doing this causes every scheduled task to run every minute to check for any potential tasks in the database which can have performance impacts.
Using a transition means we don't have to configure the schedule to run as frequently as we will attempt to send the report immediately. This is much better for performance because invoking outbound push on a single report is much cheaper than invoking all schedules for all possible tasks.
I have a good sense of what runs as transitions, but curious what we have running as schedules. Is there a list, like we have for transitions?
This is the source list: https://github.com/medic/cht-core/blob/master/sentinel/src/schedule/index.js#L8
Ready for AT in 6306-immediate-outbound
- do at the same time as #6223
AT instructions:
@brad1905 - Additional note on AT (for textit/rapidpro integration):
2 . Check if "mark_for_outbound": true
is in transitions , otherwise add that
{
"transitions": {
"mark_for_outbound": true
}
}
Add outbound
configuration property
eg :
"outbound": {
"textit-patient": {
"relevant_to": "doc.type === 'person' && doc.phone",
"destination": {
"base_url": "https://textit.in",
"auth": {
"type": "header",
"name": "Authorization",
"value_key": "textit.in"
},
"path": "/api/v2/flow_starts.json"
},
"mapping": {
"flow": {
"expr": "'[texit-flow-number-for-the-actual-flow]'"
},
"urns": {
"expr": "[ 'tel:' + doc.phone ]",
"optional": true
},
"extra.patient._id": "doc._id",
"extra.patient._rev": "doc._rev",
"extra.patient.name": "doc.name",
"extra.patient.date_of_birth": "doc.date_of_birth",
"extra.patient.medic_id": "doc.patient_id"
}
}
Add medic-credentials
to CouchDb -
(value
= api token to be provided...)
Create a person with a phone number (this is the trigger as "relevant_to": "doc.type === 'person' && doc.phone",
)
Two ways to verify :
a) whatever the flow is meant to do in textit
b) check in docker logs :
docker exec -it medic-os /bin/bash
less /srv/storage/medic-sentinel/logs/medic-sentinel.log
(inside container)
If successful logs will have:
[2020-05-05 02:08:46] 2020-05-05 02:08:46 ESC[32mINFOESC[39m: Pushed 9b46c75d-d51e-4e7f-9530-af2718336d07 to textit-patient
Otherwise, there might be logs with some info on what went wrong such as:
[2020-05-05 01:40:00] 2020-05-05 01:40:00 ESC[31mERRORESC[39m: Failed to push 69
763819-beff-516a-bf7a-7a178f298fb4 to textit-patient: { Error: CouchDB config ke
y 'medic-credentials/textit.in' has not been populated. See the Outbound documen
tation.
@SCdF The fix for this issue has eliminated the delay but has caused another problem. Pushes are scheduled immediately instead of waiting 5 minutes, but 2 messages are sent instead of 1. See 2 pushes for one identical id 1 second apart in attached log file screenshots.
So my guess is moving this code to happen in the transition has exposed a flaw / issue / oddity with Sentinel, which is that depending on what transitions you've configured it may end up executing over the same transition multiple times. For example, if you're pushing a patient you'll get one for the first write and then another after the patient_id
is written.
So with the following config:
"transitions": {
"mark_for_outbound": true,
"generate_patient_id_on_people": true
},
"outbound": {
"stefan": {
"relevant_to": "doc.type === 'person'",
"destination": {
"base_url": "http://localhost:3333",
"path": "/api/v1/patient/records"
},
"mapping": {
"id": "doc._id",
"rev": "doc._rev",
"name": "doc.name",
"shortcode": "doc.patient_id"
}
}
},
Will generate the following output to a mock server:
➜ _scratch node sih-mock.js
SIH mock up on 3333
2020-05-05T07:20:04.823Z records {
host: 'localhost:3333',
accept: 'application/json',
'content-type': 'application/json',
'content-length': '139',
connection: 'close'
} {
id: 'ec783bad-2b26-46c4-bea2-eff87bc523ba',
rev: '1-a251490cc743d23aa0e14f04b9cc8e39',
name: 'Multiple firings test',
shortcode: '22366'
}
2020-05-05T07:20:04.929Z records {
host: 'localhost:3333',
accept: 'application/json',
'content-type': 'application/json',
'content-length': '139',
connection: 'close'
} {
id: 'ec783bad-2b26-46c4-bea2-eff87bc523ba',
rev: '2-8c76be92939bf9b34cd230b103e61db8',
name: 'Multiple firings test',
shortcode: '22366'
The flow here is:
generate_patient_id_on_people
runs, creates a shortcode, attaches it, and tells sentinel that it's written to the doc being worked onmark_for_outbound
runs, and sends an outbound requestgenerate_patient_id_on_people
runs, but it can detect that it ran last time and it knows exactly what it does (adds a patient id) so it knows it doesn't have to run againmark_for_outbound
runs, and runs again.The problem is that there isn't really any decent way for outbound to know if it should run again that works for all cases. Maybe the second revision is actually because the name changed? You may now start constructing elaborate de-duping solutions in your mind palace, but they can all fall over now or in the future depending on how things are configured, and is a much more complicated topic than 3.9 has time budget for.
Having outbound support sending the same document multiple times was a mostly arbitrary decision I made back in SIH days. The most obvious solution here is to just reverse that: outbound documents only send once, on their first revision, and any subsequent edits are not supported.
@abbyad how do you feel about that last paragraph (you may ignore everything else at your convenience!). It is probably worth doing a runaround users to make sure no one is relying on that (it is doubtful), though I am wondering if you already know enough to make that exercise unnecessary.
Is there a way to identify from the outbound condition whether outbound has previously executed? If not, then I think running once it's better for now than running outbound blindly with any arbitrary edit of the doc.
This means we can't run outbound if a patient's phone number is edited, to verify it again. Running once it's better than duplicates, and I don't know of any project using this phone verification example, but worth nothing if that becomes a requirement in the future.
Is there a way to identify from the outbound condition whether outbound has previously executed?
Is "the outbound condition" referring to the relevant_to
function? If so, then no, because that function only has access to the doc, and the outbound execution log is against the doc's info doc.
Having said that, even if you did have access to it this wouldn't solve for:
This means we can't run outbound if a patient's phone number is edited, to verify it again.
Because knowing it has or hasn't run before doesn't let you know if you've already sent it for that particular phone number. Solving for this kind of thing requires a more complicated solution that is outside of the scope of 3.9.0 (unless we realise that we need to support both, in which case we might need to do something more complicated!), and there are several options we'd need to consider to best support this.
Agreed, any more complex evaluation seems beyond the scope of this feature for 3.9.
@brad1905 back to AT! In addition to other AT notes:
Actually hold off on that: I'm not sure, pending a response from @derickl, that we can move forward with this plan
@SCdF
Just had a chat with @kitsao and we can work around the limitation of sending once.
Great - moving back to AT.
OK I've fixed the merge conflicts and the latest build should now be testable, cc. @brad1905
OK here are the complete AT instructions:
-info
doc in the same way that the tasks currently do when they successfully completeAT has passed for desktop browser and Android app and this is ready to be merged
What feature do you want to improve? Outbound push
Describe the improvement you'd like Sentinel currently schedules outbound push every five minutes. For some workflows, like messaging using RapidPro, it would be ideal if those pushes happened as soon as possible so that the creation of a report or contact could trigger a messaging workflow immediately for CHWs or patients.
Describe alternatives you've considered It is possible to manually reduce the schedule interval from 5 minutes to 2 minutes, but it is unclear what implications that has if a push doesn't complete before the schedule is run again. There could be scenarios when even 2 minutes is a long time to wait for the outbound push to complete, eg phone number verification.
Additional context Setting the interval to 1 minute caused errors and the outbound push didn't work. Also, having the push interval to be less than 2 minutes is possibly more important for testing and demo purposes than in real life. We don't have an actual use case or deployment needing it to be more immediate yet.