Closed estellecomment closed 8 years ago
Here's a comprehensive doc on setting up and troubleshooting common issues with SMSsync: https://github.com/medic/medic-docs/blob/master/md/install/smssync.md
Priority : 5/5 70% of support issues are caused by SMSSync. Examples :
Other issue : messages that have already been sent are kept in memory. That's basically a memory leak. Eventually you need to delete all the messages manually, once a week for instance. There's a feature called "Auto delete messages" that should do this but it doesn't work.
There is also a draft on httpmon which can help monitor a connection to make sure a gateway connection is stable. But yeah this has been a pain point for a long time and is part of the "messaging reliability" overhaul we plan to do.
The scope of the messaging reliability issue (#1823) , as I'm currently understanding it, is limited to (1) fixing defects in the medic-transport
module, and (2) removing barriers to us upgrading SMSsync to the latest version. IMO it'd be good to do a synthetic test with the latest version of SMSsync, independent of our stack entirely, and see if there are still major defects. A couple organizations I know of have avoided SMSsync entirely after a technical analysis and ended up building their own tiny gateway app.
TL;DR: if we believe SMSsync's latest version is now stable, IMO we should prove it. If problems remain, we should fix them before going too far down an SMSsync-specific refactoring path (on the theory that they represent the most technical risk).
I'd prefer putting work into an open source project like SMSsync rather than maintain our own gateway. There was a lot of work put into it over the last 1.5 years which we haven't taken advantage of yet, much of which we contributed to designing and implementing. If it's not stable we should help make it stable.
I think we need to understand the severity of the problems at a technical level before we make a decision. I worry about sinking time into troubleshooting leaks and/or races if we don't know the severity/extent up front. Sunk cost fallacy, et cetera.
We're essentially coming from the same point of view AFAICT; my proposal is that we try to reproduce and understand the problems (or solutions) in the latest version of SMSsync up front, outside of our codebase if possible, before we decide on an approach.
Either way we will need to support some kind of results API in medic-transport. Might as well be the one we designed for and that SMSsync is using... since it's fairly decent whether running in SMSsync or through some other code? And if we do that there is no reason not to say "we support smssync" as a gateway. If then it functions poorly with SMSsync, and there are technical challenges we can't afford to solve then we start a new project that still uses the same API. Neither the http API work or the SMSsync work are thrown away. If someone wants to use the SMSsync gateway because it has some other feature they like and are willing to sacrifice a trade off for stability or some other feature, that's fine. It also enriches the open source ecosystem.
messages are sent twice (that would be fixed by updating SMSSync)
@estellecomment do you know when this was fixed in smssync? e.g. a link to a commit or issue referencing the problem, or have we tested it?
No, not precisely. What was fixed is to retry only the failed messages, instead of retrying the whole batch, which avoids resending the successful ones.
It's unlikely that upgrading to the latest SMSSync (3.0.x) will be useful: https://github.com/ushahidi/SMSSync/issues/371 However, patching to 2.8.3 (released June 2015) may be worthwhile.
SMSSync sends duplicates when it has connection problems : https://github.com/medic/medic-projects/issues/125#issuecomment-203338856
@estellecomment does SMSSync send duplicate SMS messages, or does it send duplicate messages terminating at medic-webapp
?
@alxndrsn we have experienced both.
@ngamita is also interested in this
Integration work for API has been done at https://github.com/medic/medic-api/tree/medic-gateway-support
Blocked/requesting assistance as I don't understand the database formats for messages.
@mandric i think you've reviewed this in https://github.com/medic/medic-api/pull/69
This was solved by implementing medic-gateway. No specific AT.
SMSSync is unreliable, and the tech leads have to nudge/restart it regularly, and end up carrying the gateways phone around and babysitting them. Not good.
This will eventually be solved by #1823 (messaging reliability)(Edited: DB), but I'm filing separately to keep track of Tech Lead pain specifically :)Our fork of SMSSync is behind, because SMSSync added some new features that our api wouldn't support (e.g. retrying the whole batch vs. only the failed messages)
Meanwhile, I hear that @mandric has built some tips and tricks to help with the babysitting process? Documenting them all at the same place would be nice, if it's not already done.