Closed bookshelfdave closed 7 years ago
We need the following additional automation:
nodes.<k8s_cluster>
security groupFor the record we've always been using TCP (never enabled HTTP on careers or snippets due to bug #156 which did not get fixed). So we don't have evidence that HTTP does not work for careers. For snippets I was not around during the http experiment (timezones suck) so I can't comment but I know that we mostly run snippets over TCP since toronto and we're continuously experiencing timeouts. So I'm not sure what this bug buys us in terms of timeouts.
I discussed the http->https forward service with @jgmize and i agree it's a solution. I prefer the flexibility of managing redirects in the app and not via an external service but that also works. I think managing the ELBs outside k8s complicates things, especially for non-SREs.
I'll complete non ELB related tasks for #140 and #141 so you're unblocked to do your magic. 🍺
I feel like we need a quick regroup on this.
@glogiotatidis do you have a preference for K8s managed ELB's?
@glogiotatidis do you have a preference for K8s managed ELB's?
re-read by comment and changed "I think managing the ELBs outside k8s complites things, especially for non-SREs." to "I think managing the ELBs outside k8s complicates things, especially for non-SREs."
I find k8s easier to deal with than tf so I would say yes, given that we're able to accomplish what we wan with k8s annotations, and to the limited extend I understand k8s and elbs we can.
I did some mindmapping on this, here's what I think are the pros/cons of each solution:
(click the image to expand)
I would love to have a fix for #153 (red Xs in @metadave's awesome mind map) that didn't require management of ELBs outside of k8s itself. Another option that @metadave and I discussed but haven't tested yet is to patch each of the master nodes to be unschedulable, as suggested in https://github.com/kubernetes/kops/issues/639#issuecomment-287015882. DaemonSets should not be affected by this in versions 1.6 and below, but that may change in 1.7 so we would need to keep this in mind for future upgrades-- hopefully the k8s issue would be resolved in that same release though.
I personally would prefer to deal with http->https redirects outside of the apps, as it simplifies the application code and should give a minor performance improvement at the app level. This can be done with sidecar containers on k8s managed ELBs, or as independent services with tf managed ELBs.
We're suspicious of the ELB TCP healthchecks causing Gunicorn issues.
In order to switch to HTTP healthchecks, I needed to set ALLOWED_HOSTS=* for both careers and snippets, as the healthchecks are by IP and there is no way to set the host header. Also, ideally we should implement a /healthz for each app.
ELB Security group automation here
I planned to create /healthz
for both anyway to 👍 But how /healthz
is going to help with ALLOWED_HOSTS
?
Also I found https://dryan.com/articles/elb-django-allowed-hosts/
/healthz won't help with ALLOWED_HOSTS. That was meant as a side comment, not a solution-- my apologies for the lack of clarity in my original comment, and I've edited it to add an "Also, " in front.
@glogiotatidis would you mind linking replying with links to the issues tracking /healthz in snippets and careers here? Also, I like the suggestion to append '.compute-1.amazonaws.com' to the list of ALLOWED_HOSTS instead of using '*'; let's give that a shot soon.
Also, let me reiterate that managing ELBs directly with TF is a temporary workaround, not a long term solution. Let me also clarify that while I have a personal preference on the http->https redirects, I have no strong objections to other approaches.
http->https redirection PR here
we still need to decom K8s-managed snippets/careers ELB's before this gets closed out
old k8s-managed snippets & careers ELBs decommed in #175.
This is an umbrella issue covering recent AWS ELB discussions for Kubernetes-managed applications.
In a discussion with @jgmize and @metadave, we've decided to switch all K8s app ELB listener load balancer protocols to use
TCP
(andSSL
) instead ofhttp
andhttps
. We experimented with switching load balancer protocols to http/https in Toronto, which caused timeout issues with Gunicorn.Our proposed solution, in two parts, is as follows:
As a first pass, we'll use a
NodePort
service that listens for http and https requests and directs via selector to the appropriate app deployment. These NodePorts have already been created and applied for snippets and careers, via this PR. Snippets and careers ELB's have been updated manually via the AWS console.The second pass will be Terraform managed ELB's, which removes K8s
LoadBalancer
services for each application. This allows us to have full control of ELB creation without the use of alpha/beta K8s annotations. For any application that requires an http to https redirect externally, we can use a new K8s service running nginx that does a simple http->https redirect. Each application would then have 2 services:The nginx redirector can use a horizontal pod autoscaler if we have dynamic http load.
Terraform ELB provisioning
I'll create a directory structure similar to the Snippets multi-region Terraform setup. I'll add one minor tweak, where Terraform state from all regions will be stored in a single shared bucket to help prevent S3 bucket clutter. In this directory, we'll have a main
elb
Terraform module, which serves as a sort of template that's used when creating load balancers. Any new K8s application that requires a load balancer would only need to populate the required variables and run Terraform to apply. For any additional ELB customization, theelb
module can be duplicated (and renamed) with any any additional customization.Referenced issues
We've decided not to update the Deis ELB to use http/https.
Careers and Snippets ELB's have been manually change via the AWS console to use
TCP -> 80
andSSL (Secure TCP) -> 443
.TODO
LoadBalancer
LoadBalancer
cc @jgmize @glogiotatidis