medic / cht-core

The CHT Core Framework makes it faster to build responsive, offline-first digital health apps that equip health workers to provide better care in their communities. It is a central resource of the Community Health Toolkit.
https://communityhealthtoolkit.org
GNU Affero General Public License v3.0
438 stars 209 forks source link

Research way for developers to test TLS requiring android app against specific cht-core branches #6553

Closed mrjones-plip closed 3 years ago

mrjones-plip commented 4 years ago

Describe the issue Currently developers who are following our developer guide are instructed to reverse proxy their dev instance behind ngrok's TLS servers. This allows the android app to connect with a valid TLS cert but still gives the developer flexibility to run their own, local code which they can easily iterate on.

When we were updating the dev guide in #6551, it came up that ngrok's free tier can rate limit you under heavy use. As well, serveo ,obviously, no longer works. We wondered if setting up our own reverse proxy was easy enough that we should research it. This would have the added benefit that when testing production instances with real customer data, we would not have grant access to this data unencrypted to third parties like ngrok, and soon, pagekite.

There are two classes of developers that would use this that we should consider when doing research:

Describe the improvement you'd like Research the cost in time to initially set up and ongoing hosting fees of a DIY reverse proxy for terminating TLS. This might include:

Describe alternatives you've considered Alternatives are:

garethbowen commented 4 years ago

Another alternative is to pay for pro level service which should get around the throttling.

Hareet commented 4 years ago

Using a more SRE approach with a feature build docker image (something like docker-letsencrypt) coupled with AWS docker hosting and AWS load balancers which can terminate TLS

Two scenarios SRE recommends:

  1. Everyone can use linuxserver's docker-letsencrypt with some port config to talk to the rest of the dev setup (or deploy it inside the docker network medic-net if you are using medic-os). You can take the nginx.conf from medic-os and likely pass it through.

    If you need a dynamic dns provider, you can use the free provider duckdns.org where the URL will be yoursubdomain.duckdns.org and the SUBDOMAINS can be www,ftp,cloud with http validation, or wildcard with dns validation.

They also provide a container that will update your IP every 5 minutes with their DNS service.

Here's the steps I imagine: Turn up node on 5988, launch these containers with nginx forwarding to medic-api, open your router ports for 443 (restrict ip here), launch your tests against the url (successful tls termination) and the traffic should route back to your local environment.

  1. A quick way for internal devs with subdomains:
    • A script that updates dns and security group on AWS: https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/Route53.html creates subdomain mrjones.dev.medicmobile.org, terminates tls at lb, and points traffic back to your local computer restricted by your source IP. It's an Application Load Balancer, so it can forward TCP to 5988. This would replicate our hosted prod setup.

Both of these should be fairly easy, but I could be missing some functionality needed in the dev environment 🤷

mrjones-plip commented 4 years ago

Another alternative is to pay for pro level service which should get around the throttling.

Good point! We might want to quantify what this needs to be. The jump up from free to basic or pro only goes from 40connections/min -> 60connections/min. I suspect we'd need to go to 120 at Business.

@dianabarsan - you have stats that would validate my hunch?

mrjones-plip commented 4 years ago

@Hareet - Thanks for the input! I was definitely lacking a lot detail from the SRE perspective, so much appreciated.

The rigidity of dynamic DNS might be limiting though. What if I'm at a different location (thus different IP)? What if I want to test over my Phone's hotspot/MiFi/what-ever-the-kids-call-it-these-days while at the same location? "pushing" the SSH tunnel (eg ssh -R 5988:localhost:4500 ec2.host.example.com) is REALLY handy. And I can quickly kill the SSH tunel on one VM and resume on my laptop on the road (assuming preemptive thoughtful developer public SSH key dissemination). This also saves you from fiddling with your so so consumer grade router to punch a hole through NAT (and then redoing this when you want to change IP/locations). But wait, oh, just wait! I'm not actually doing dev work on cht-core just yet, so take my feedback with a big ol grain of salt. And maybe I'm miss-reading your steps - feel free to call me out!

Hareet commented 4 years ago

What if I'm at a different location (thus different IP)?

They provide a container that updates the DNS service to your IP every few minutes. You would just kick that off alongside the other containers they provide.

What if I want to test over my Phone's hotspot/MiFi/what-ever-the-kids-call-it-these-days while at the same location?

Hm, I'm not sure if it's worth covering every scenario. Perhaps a combination of the methods would do it all?

This also saves you from fiddling with your so so consumer grade router to punch a hole through NAT (and then redoing this when you want to change IP/locations).

I think if we create a server that allows ssh tunneling for internal devs, we will still have to create something that functions for self-hosted partners to cover our main use-cases. I'd rather the devs deal with this upfront and script it for self-hosted and everyone to use (with missing a few scenarios) than try to manage a one-off server.

My 2 cents. We can provide credentials for whatever resources you all need.

dianabarsan commented 4 years ago

I'd say we need definitely more than 120 connections. At least 500. With ngrok, this is the message I got when I was replicating a user with ~100 docs.

Too many connections! The tunnel session <a session> has violated the rate-limit policy of 20 connections 
per minute by initiating 234 connections in the last 60 seconds.

For a user with legitimate data, where reports have attachments, the configuration has multiple forms (each with attachments), every attachment request counts towards this limit and they very easily add up (I think the 20 connections are already consumed by simply downloading the resources doc's attachments). Granted, I'm on the "free" plan, but bumping rate to 60 connections / minute hardly makes a difference.

This is what the browser reports when replicating a larger user - 6500 docs with legitimate data: image Granted this replication was against localhost, so I'd expect tunneled requests to take longer. 6500 requests, at a rate of 60/min means ~108 minutes to replicate this user.

With pagekite, I don't have connection-rate-limiting issues but do sometimes have problems with large post requests timing out (large post from client to server dying while transferring data), so it's not the perfect tool either.

mrjones-plip commented 4 years ago

@Hareet - cool cool - thanks very much for the feedback. I didn't mean to suggest all of the "push traffic back to your FQDN" wouldn't work (I can see how they will!), I more wanted to show the SSH tunnel is more flexible (but maybe less reliable?!?). Anyway, all good info to gather.

@dianabarsan - woah!! really great to know this data point. Even the for pay versions of ngrok are WAY under-powered then. This is a case when any of the unmetered DIY solutions we're considering will massively outpace a for pay solution.

mrjones-plip commented 4 years ago

Oh yeah - how important is it for folks to see the per request real time info? For example, ngrok gives you this when you fire it up and make reqeusts:

image

I was thinking it wasn't that important b/c you likely have the API running via node server.js in a terminal and you can see the same thing, but I don't want to assume anything!

garethbowen commented 4 years ago

I agree it's not important to see the request logging. It's nice sometimes to help debug where an issue lies (eg: you forgot to start API), but that's usually fairly obvious.

mrjones-plip commented 3 years ago

Huh - I forgot about this ticket! I must have this on the brain though as I later went and did the [DIY SSH tunnel guide](DIY SSH tunnel guide) and I'm now doing a skunk works project based on the same idea: host tunnels for devs based off GH SSH keys and GH username.

Maybe moving toward closing this ticket based off this?! (cc @newtewt too)

ronna commented 3 years ago

@mrjones-plip I came across localtunnel and I am not sure if its an alternative solution https://theboroer.github.io/localtunnel-www/ https://github.com/localtunnel/localtunnel

dianabarsan commented 3 years ago

I've used localtunnel sporadically. I was actually planning on installing localtunnel server somewhere to use permanently.

mrjones-plip commented 3 years ago

@ronna - Oh - localtunnel looks really neat! Thanks for the tip.

@dianabarsan - report back if you do set one up - it'd be good experience to know from.

One thing that seems odd is that the server set up does not discuss TLS certs. Maybe it also assumes you have a wildcard cert?!

Update: A problem that these providers have is abuse b/c malware/phishing sites can hide behind them. It would be good to find one that supported client authentication, currently lacking in localtunnel (but could be put in likely it looks like)

mrjones-plip commented 3 years ago

With the advent of:

I'm going to close this ticket. If anyone thinks this is a bad idea, please re-open!