medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
441 stars 212 forks source link

SMSSync testing and upgrade #2109

Closed estellecomment closed 8 years ago

estellecomment commented 8 years ago

SMSSync is unreliable, and the tech leads have to nudge/restart it regularly, and end up carrying the gateways phone around and babysitting them. Not good.

This will eventually be solved by #1823 (messaging reliability) (Edited: DB), but I'm filing separately to keep track of Tech Lead pain specifically :)

Our fork of SMSSync is behind, because SMSSync added some new features that our api wouldn't support (e.g. retrying the whole batch vs. only the failed messages)

Meanwhile, I hear that @mandric has built some tips and tricks to help with the babysitting process? Documenting them all at the same place would be nice, if it's not already done.

bishwasBhatta commented 8 years ago

Here's a comprehensive doc on setting up and troubleshooting common issues with SMSsync: https://github.com/medic/medic-docs/blob/master/md/install/smssync.md

estellecomment commented 8 years ago

Priority : 5/5 70% of support issues are caused by SMSSync. Examples :

estellecomment commented 8 years ago

Other issue : messages that have already been sent are kept in memory. That's basically a memory leak. Eventually you need to delete all the messages manually, once a week for instance. There's a feature called "Auto delete messages" that should do this but it doesn't work.

mandric commented 8 years ago

There is also a draft on httpmon which can help monitor a connection to make sure a gateway connection is stable. But yeah this has been a pain point for a long time and is part of the "messaging reliability" overhaul we plan to do.

ghost commented 8 years ago

The scope of the messaging reliability issue (#1823) , as I'm currently understanding it, is limited to (1) fixing defects in the medic-transport module, and (2) removing barriers to us upgrading SMSsync to the latest version. IMO it'd be good to do a synthetic test with the latest version of SMSsync, independent of our stack entirely, and see if there are still major defects. A couple organizations I know of have avoided SMSsync entirely after a technical analysis and ended up building their own tiny gateway app.

TL;DR: if we believe SMSsync's latest version is now stable, IMO we should prove it. If problems remain, we should fix them before going too far down an SMSsync-specific refactoring path (on the theory that they represent the most technical risk).

mandric commented 8 years ago

I'd prefer putting work into an open source project like SMSsync rather than maintain our own gateway. There was a lot of work put into it over the last 1.5 years which we haven't taken advantage of yet, much of which we contributed to designing and implementing. If it's not stable we should help make it stable.

ghost commented 8 years ago

I think we need to understand the severity of the problems at a technical level before we make a decision. I worry about sinking time into troubleshooting leaks and/or races if we don't know the severity/extent up front. Sunk cost fallacy, et cetera.

ghost commented 8 years ago

We're essentially coming from the same point of view AFAICT; my proposal is that we try to reproduce and understand the problems (or solutions) in the latest version of SMSsync up front, outside of our codebase if possible, before we decide on an approach.

mandric commented 8 years ago

Either way we will need to support some kind of results API in medic-transport. Might as well be the one we designed for and that SMSsync is using... since it's fairly decent whether running in SMSsync or through some other code? And if we do that there is no reason not to say "we support smssync" as a gateway. If then it functions poorly with SMSsync, and there are technical challenges we can't afford to solve then we start a new project that still uses the same API. Neither the http API work or the SMSsync work are thrown away. If someone wants to use the SMSsync gateway because it has some other feature they like and are willing to sacrifice a trade off for stability or some other feature, that's fine. It also enriches the open source ecosystem.

alxndrsn commented 8 years ago

messages are sent twice (that would be fixed by updating SMSSync)

@estellecomment do you know when this was fixed in smssync? e.g. a link to a commit or issue referencing the problem, or have we tested it?

estellecomment commented 8 years ago

No, not precisely. What was fixed is to retry only the failed messages, instead of retrying the whole batch, which avoids resending the successful ones.

alxndrsn commented 8 years ago

It's unlikely that upgrading to the latest SMSSync (3.0.x) will be useful: https://github.com/ushahidi/SMSSync/issues/371 However, patching to 2.8.3 (released June 2015) may be worthwhile.

estellecomment commented 8 years ago

SMSSync sends duplicates when it has connection problems : https://github.com/medic/medic-projects/issues/125#issuecomment-203338856

alxndrsn commented 8 years ago

@estellecomment does SMSSync send duplicate SMS messages, or does it send duplicate messages terminating at medic-webapp?

henokgetachew commented 8 years ago

@alxndrsn we have experienced both.

estellecomment commented 8 years ago

@ngamita is also interested in this

alxndrsn commented 8 years ago

Integration work for API has been done at https://github.com/medic/medic-api/tree/medic-gateway-support

Blocked/requesting assistance as I don't understand the database formats for messages.

alxndrsn commented 8 years ago

@mandric i think you've reviewed this in https://github.com/medic/medic-api/pull/69

estellecomment commented 7 years ago

This was solved by implementing medic-gateway. No specific AT.