[Spike] Investigate Jupiter on Kubernetes

murny commented 2 years ago

Kubernetes - https://kubernetes.io

Spike Goal

Replace previously manual VM management completely with declarative service definitions and automated orchestration.

Spike questions

Documentation and investigation into what Kubernetes is and it's concepts. How can we train rest of the developers to know enough about Kubernetes to be able to deploy and troubleshoot problems?
What does Jupiter need to do in order to run in Kubernetes? Does it need a new DockerFile? What would a working Kubernetes Manifest look like for our stack?
What does Jupiter on Kubernetes look like on Azure? What does deployments look like from the cloud?

More info: https://gist.github.com/mbarnett/711bb6e19907350661db573ff2227b01

murny commented 2 years ago

Spike questions and answers

Documentation and investigation into what Kubernetes is and it's concepts. How can we train rest of the developers to know enough about Kubernetes to be able to deploy and troubleshoot problems?

Tons of tutorials online. Great place to start is the official docs and tutorials from https://kubernetes.io (eg: https://kubernetes.io/docs/home/)

Since we also using Azure, Microsoft has a ton of tutorials and little courses you can take for free like this one: https://docs.microsoft.com/en-us/learn/modules/aks-deploy-container-app/

What does Jupiter need to do in order to run in Kubernetes?

Jupiter doesn't need to change too much itself. It will be a good idea to providing healthchecks for rails and sidekiq as this will greatly improve how Kubernetes operates (you don't want your containers to start receiving requests when they are not finished initializing for example). Otherwise we don't need to change too much with the application itself.

Does it need a new DockerFile? For Production yes, we will need to improve our Dockerfile with some additions (FITS/FFMPEG/ClamAV and anything other features we currently have in production).

What would a working Kubernetes Manifest look like for our stack?

We will essentially have 4 or so manifests:

ConfigMap manifest file to setup any ENV VARs our application requires
Deployment manifest file to setup Sidekiq/Rails App pods/containers
Service manifest to map our ingress port (port 80) to our pods ports (3000)
Ingress manifest which will use NGINX to map external traffic to our pods via our service
Potentially setup Solr container (Helm recipe?)

What does Jupiter on Kubernetes look like on Azure? Basically theirs three areas our team will be keeping tabs on when we living in the cloud:

Azure Portal for viewing and monitoring our infrastructure. This Azure Portal provides lots of tools for us to gather Metrics of our application, viewing Logs, and Alerting our team when things need attention
Terraform scripts for provisioning and updating infrastructure. May need updating and/or improving over time.
kubectl for gathering info about our cluster and doing deployments

What does deployments look like from the cloud? For deployments any developer can just run a kubectl command to do a no downtime rolling deployment.

Most likely we will have a Github Action that will build a new image of our containers anytime new code hits ours master branch (believe we currently do this for UAT already but can also look at my Demo for an example).

Then when a developer wants to deploy they can simply just do:

kubectl rollout restart deployment/jupiter-app -n jupiter
kubectl rollout restart deployment/jupiter-worker -n jupiter

This will create a new pod for both our app and workers (which will grab the latest image we have created) and spin up a new container. Then it will move traffic over to these new pods and remove the old pods. All without any downtime for our users.

This could easily be automated as well via a Github Action.

If you wanted to deploy a specific version of an image (say we still cut releases) you can set the new image version like so:

# Rolling update "jupiter" containers of "app" deployment, updating the image
kubectl set image deployment/jupiter-app jupiter-app=jupiter:v2.0.2
kubectl set image deployment/jupiter-worker jupiter-worker=jupiter:v2.0.2

# Think you can combine these too? Needs verifcation
kubectl set image deployments *=jupiter:v2.0.2

To rollback a deployment?


# Check the history of deployments including the revision 
kubectl rollout history deployment/jupiter-app           

# Rollback to the previous deployment
kubectl rollout undo deployment/jupiter-app                     

# OR
# Rollback to a specific revision
# kubectl rollout undo deployment/jupiter-app --to-revision=2        

# Watch rolling update status of "jupiter-app" deployment until completion
kubectl rollout status -w deployment/jupiter-app

pgwillia commented 2 years ago

You mentioned Helm here and in Teraform Spike answers. What is it?

How would you spin up a UAT environment vs a production one? What kind of safe guards should we think about building into production deployments? -- Maybe that's a Teraform spike question?

Is there an easy way to have production like/equivalent data when creating a UAT environment?

murny commented 2 years ago

Thanks for the followup questions!

You mentioned Helm here and in Terraform Spike answers. What is it?

Helm is essentially a package manager (think Bundle/Yarn but for kubernetes). So it's a way to use community made kubernetes containers/etc without us having to reinvent the wheel. So for NGINX Ingress, we are using the helm chart for this (https://artifacthub.io/packages/helm/ingress-nginx/ingress-nginx). For quickly getting a Solr pod up and running for Jupiter we may want to entertain a Helm chart as well (then we can look into Zookeeper/doing it ourselves down the road)

How would you spin up a UAT environment vs a production one? What kind of safe guards should we think about building into production deployments? -- Maybe that's a Teraform spike question?

Yeah this probably more Terraform related but I'll answer it here. I think essentially we treat "UAT" as a staging environment. It should almost be a mirror of production as much as it can be. You can have Terraform take in "variables" which depending on these variables, the provisioning steps could change. So maybe instead of the highest tier Postgres DB that Azure offers, for UAT/Staging, you may just want to opt in for a smaller tier Postgres DB, etc. Which is easily configurable. But I think for the most part they will basically be the same or as close to the same as possible? This will give us confidence when we go to production. And anything that changes in production, should be done via terraform so staging/uat or any new environment gets those changes for free. Everything should be automated, no more manual task for provisioning the different servers.

Is there an easy way to have production like/equivalent data when creating a UAT environment?

Could be where our seed data comes in? You always have access to the pods to run commands like so:

kubectl exec jupiter-app-df869766d-nf4xj -n jupiter --kubeconfig kubeconfig -- bin/rails db:seeds

Or could look in a way to take a nightly backup dump from Azure (as Production should be backing up regularly) and import that into on the staging/uat server etc?

jefferya commented 2 years ago

Helm

Would a tour of a Helm chart and how to use as a templating tool for k8s deployment manifests be useful? CWRC is using Helm as a means to generalize a k8s deployment manifest for production, staging, uat/review environments within a ci/cd pipeline.

Is there an easy way to have production like/equivalent data when creating a UAT environment?

Aside from seed data, another idea coming from a similar use-case in my CWRC position:

running a batch ingest of material pulled from a subset of production content (or batch ingest manifests) specifically selected to enhance the UAL experience.
- manually run after deploy
- via a ci/cd manual stage
- or via container startup or k8s initContainer pattern In this CWRC use case, the goal is to populate the UAT environment with ~10% of production (i.e., 100GB / 40k resources)

ualbertalib / jupiter