This repository contains the fully specified deployment files for the prototype TESS platform.
The following is an exerpt from A Proposal for a Science Platform for TESS <https://innerspace.stsci.edu/pages/viewpage.action?spaceKey=DSMO&title=A+Proposal+for+a+Science+Platform+for+TESS>
_
We propose to create a TESS-focused, JupyterHub-based, science platform that will allow users to:
This prototype has two primary deployments:
Both of these deployments will have very similar features, but differ in terms of resources allocated to them.
tess.omgwtf.in <https://tess.omgwtf.in>
_ is an emphemeral, mybinder.org style,
unauthenticated hub focused on outreach and teaching.
nbgitpuller <https://jupyterhub.github.io/nbgitpuller/>
is installed, so you can
make nbgitpuller links <https://nbgitpuller.link>
to share with users.
When clicked, they will start an ephemeral session, pull in the git repo linked to,
and open the appropriate directory / file.
This link <https://tess.omgwtf.in/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fspacetelescope%2Fnotebooks&urlpath=lab%2Ftree%2Fnotebooks%2Fnotebooks%2FMAST%2FTESS%2F>
opens the spacetelescope/notebooks <https://github.com/spacetelescope/notebooks>
git repo, but opens specifically into the TESS related directory. This link <https://tess.omgwtf.in/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fspacetelescope%2Fnotebooks&urlpath=lab%2Ftree%2Fnotebooks%2Fnotebooks%2FMAST%2FTESS%2Fbeginner_how_to_use_ffi%2Fbeginner_how_to_use_ffi.ipynb>
_
is almost the same, but opens a specific notebook.
private.tess.omgwtf.in <https://private.tess.omgwtf.in>
_ is an authenticated
JupyterHub with persistent storage, otherwise similar to TESS Private. It currently
uses GitHub for authentication, but lets everyone with a GitHub account through.
nbgitpuller links (directory <https://private.tess.omgwtf.in/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fspacetelescope%2Fnotebooks&urlpath=lab%2Ftree%2Fnotebooks%2Fnotebooks%2FMAST%2FTESS%2F>
_,
notebook <https://private.tess.omgwtf.in/hub/user-redirect/git-pull?repo=https%3A%2F%2Fgithub.com%2Fspacetelescope%2Fnotebooks&urlpath=lab%2Ftree%2Fnotebooks%2Fnotebooks%2FMAST%2FTESS%2Fbeginner_how_to_use_ffi%2Fbeginner_how_to_use_ffi.ipynb>
)
work here as well, with the added advantage that nbgitpuller will
do 'automagic' merging for you, so both the author of the git repo and the user in
JupyterHub can make changes to the notebook, and it will always preserve the user's changes <https://jupyterhub.github.io/nbgitpuller/topic/automatic-merging.html>
. This
is extremely useful in workshops, since instructors can continue tweaking materials after
start of the workshop without worry of overwriting students' work.
This repository captures the complete system state of all the deployments for this prototype.
This includes any AWS resources, the configuration of the JupyterHubs, secrets required to run
the JupyterHubs, and the images themselves. This lets us do continuous deployment <https://www.atlassian.com/continuous-delivery/continuous-deployment
_ - most changes to the
configuration are made via GitHub pull requests to this repository. We will run automated tests
against the pull request, and when satisfied, merge the pull request, which will deploy the
changes. This increases the number of people who can safely make changes to the configuration
of the hubs, empowering people to make changes as well as reducing the load on the folks who
set up the infrastructure.
This is modelled around the deployment models of the PANGEO project <https://github.com/pangeo-data/pangeo-cloud-federation/>
, the mybinder.org project <https://github.com/jupyterhub/mybinder.org-deploy>
, UC Berkeley's instructional hubs <https://github.com/berkeley-dsep-infra/datahub>
and many other projects that are using
hubploy <github.com/yuvipanda/hubploy>
.
image/
)We try to use the same image for the private and public instances, and this image is
present in deployments/tess-private/images/default
.
repo2docker <https://repo2docker.readthedocs.io/en/latest/>
_ is used to
build the actual user image, so you can use any of the supported config files <https://repo2docker.readthedocs.io/en/latest/config_files.html>
_ to customize
the image as you wish. Currently, the environment.yml
file does most of the work.
.. _readme/repo-contents/config:
config/
and secrets/
)All the JupyterHubs are based on Zero to JupyterHub (z2jh) <http://z2jh.jupyter.org/>
.
z2jh uses configuration files in YAML <https://en.wikipedia.org/wiki/YAML>
format
to specify exactly how the hub is configured. For convenience, and to make sure we do
not repeat ourselves, this config is split into multiple files that form a hierarchy.
hub/values.yaml
contains config common to all the hubs in this repositorydeployments/<deployment>/config/common.yaml
is the primary config for the hubreferred to by <deployment>
. The values here override hub/values.yaml
.
deployments/<deployment>/config/staging.yaml
and deployments/<deployment>/config/prod.yaml
have config that is specific to the staging or production versions of the deployment. These should be as minimal as possible, since we try to keep staging & production as close to each other as possible.
Further, we use git-crypt <https://github.com/AGWA/git-crypt>
to store encrypted
secrets in this repository (although we would like to move to sops <https://github.com/mozilla/sops>
in the future). Encrypted config (primarily auth tokens and other secret tokens) are
stored in deployments/<deployment>/secrets/staging.yaml
and deployments/<deployment>/secrets/prod.yaml
.
There is no common.yaml
, since staging & production should not share any secret values.
hubploy.yaml
We use hubploy <https://github.com/yuvipanda/hubploy>
_ to deploy our hubs in a
repeatable fashion. hubploy.yaml
contains information required for hubploy to
work - such as cluster name, region, provider, etc.
Various secret keys used to authenticate to cloud providers are kept under secrets/
for that deployment and referred to from hubploy.yaml
.
We need the following AWS resources set up for the hubs to run properly:
Amazon EKS <https://aws.amazon.com/eks/>
_, with multiple
node groups for 'core' and 'user' nodes.Amazon EFS <https://aws.amazon.com/efs/>
_cluster autoscaler <https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler>
and EFS Provisioner <https://github.com/kubernetes-incubator/external-storage/tree/master/aws/efs>
.IAM User Credentials <https://docs.aws.amazon.com/IAM/latest/UserGuide/id_users.html>
_.Instead of creating and maintaining these resourrces manually, we use the popular
terraform <https://www.terraform.io/>
tool to do so for us. There is an attempt to
build a community-wide terraform template that can be used by different domains that need
a JupyterHub+Dask analytics cluster at https://github.com/pangeo-data/terraform-deploy. We
refer to it via a git submodule <https://git-scm.com/book/en/v2/Git-Tools-Submodules>
in
this repo under cloud-infrastructure
, with parameters set in infrastructure.tfvars
.
This is heavily a work in progress, but the hope is that eventually we'll have security, performance and cost optimized clusters that can be set up from this template.
Identify the files to be modified to effect the change you seek.
deployments/tess-private/images/default
-all deployments share this image. repo2docker <https://repo2docker.readthedocs.io/en/latest/>
_ is
used to build image, so you can use any of the supported config files <https://repo2docker.readthedocs.io/en/latest/config_files.html>
_ to customize
the image as you wish.
Currently, the environment.yml
file has all packages, while JupyterLab plugins are installed
via postBuild
.
hub/values.yaml
, with per-deployment overrides indeployments/<deployment>/config/
. See section on config files <readme/repo-contents/config>
_
earlier in this document.
GitHub Action <https://github.com/features/actions>
_ on thePR. Note that at this point, it only tests the image to make sure it builds properly. No tests are performed on the configuration. Wait for this test to pass. If it fails, fix it until it passes.
deploy the changes to the staging hubs of both deployments. You can follow
this in the Actions <https://github.com/yuvipanda/tess-prototype-deploy/actions>
_
tab in GitHub.
use the same image, so you can use either to test image changes. Test config changes on the appropriate staging hub.
PRs and merging them to staging until it is.
the current staging branch to prod - always use this handy link <https://github.com/yuvipanda/tess-prototype-deploy/compare/prod...staging>
_. You
shouldn't merge your PR into prod - you should only merge staging to prod. This
keeps our git histories clean, and helps makes reverts easy as well.
If you already have a running server, you have to restart it to pick up new image changes (File -> Hub Control Panel).
We shall try to use secure defaults wherever possible, while making sure we do not affect usability too much.
efs-provisioner <https://github.com/helm/charts/tree/master/stable/efs-provisioner>
_for setting up NFS home directories. This way, each user's pod only gets to mount their particular home directory, instead of mounting the entire NFS share.
securityContext <https://kubernetes.io/docs/tasks/configure-pod-container/security-context/>
_to run user pods as a non-root user, and disable any setuid binaries (like sudo)
with no-new-privs <https://www.kernel.org/doc/html/latest/userspace-api/no_new_privs.html>
_.
instance metadata endpoint <https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-metadata.html>
_,which often contains sensitive credentials.
PodSecurityPolicy <https://kubernetes.io/docs/concepts/policy/pod-security-policy/>
_to control what kind of pods dask-kubernetes <https://kubernetes.dask.org/en/latest/>
_
can create. This is currently a biggish security hole, since the ability to create
arbitrary pods can be easily escalated to root. Should be fixed shortly.
NetworkPolicy <https://kubernetes.io/docs/concepts/services-networking/network-policies/>
_to set up internal firewalls, so we only permit whitelisted internal traffic. We could also possibly restrict outbound traffic to only ports 80 and 443.
public subnet since EKS managed node groups do not support private subnets <https://github.com/aws/containers-roadmap/issues/607>
_. This needs to be
fixed by Amazon, or we can use non-managed nodegroups.
dask-gateway <https://github.com/dask/dask-gateway>
_instead of dask-kubernetes. This gives us much better multi-tenancy and
security isolation. It is currently undergoing a biggish architecture change <https://github.com/dask/dask-gateway/issues/198>
_, and we can switch once
that lands.
way to do this is to use EFS Access Points <https://docs.aws.amazon.com/efs/latest/ug/efs-access-points.html>
.
This needs upstream work in the AWS CSI Driver <https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/124>
.
Switching to the AWS CSI Driver will also give us encryption in transit for home directories.
We need the CSI driver to add dynamic provisioning support <https://github.com/kubernetes-sigs/aws-efs-csi-driver/issues/6>
_
first, though.
We can do this when we really open it up to the public.
Ideally, we would be able to put resources into some of these upstream fixes - they are fairly well specified and isolated.