Open stephen-soltesz opened 2 years ago
Due to the v2 data pipeline cluster location in some projects, data must be transferred between regions in sandbox and staging project. This can be eliminated by placing these projects in us-central1 region.
mlab-oti archive-measurement-lab us-central1 to data-processing us-central1
mlab-staging archive-measurement-lab us-central1 to data-processing us-east1
mlab-sandbox archive-measurement-lab us-central1 to data-processing us-east1
etl-mlab-sandbox Jun 13, 2017, 3:22:04 PM Region us-east1
etl-mlab-staging Jul 31, 2020, 4:03:17 PM Region us-east1
etl-mlab-oti Aug 6, 2020, 7:48:10 PM Region us-central1
Since this requires updates to sandbox and staging projects, the disruption will be minimal.
Changing the data-processing cluster locations will be easy. Changing the output target buckets may not be..
The data-processing cluster includes multiple node pools for service-specific workloads:
The commands used to create these node pools are various (and likely dated or incomplete):
Repositories with services on the data-processing cluster (one per node pool):
This should be completed using Terraform not manual, adhoc recreations.
Evidently, while gcloud supports bulk-export for some resource types, GKE is not yet one of them.
Documentation on the Terraform gke module
GKE resource is called something else in this context, ContainerEngine, and ContainerNodePools
Running this command requires additional permissions than basic roles alone. https://cloud.google.com/asset-inventory/docs/access-control#required_permissions
gcloud beta resource-config bulk-export \
--resource-types=ContainerCluster,ContainerNodePool \
--project=mlab-sandbox --resource-format=terraform \
--path=output
Additional types are ComputeNetwork and ComputeSubnetwork for declaring the VPC networks over which the cluster communicates.
gcloud beta resource-config list-resource-types
gcloud beta resource-config bulk-export \
--resource-types=ComputeNetwork,ComputeSubnetwork \
--project=mlab-sandbox --resource-format=terraform --path=output
Current data processing cluster workloads are using deprecated APIs.
The deprecated APIs appear to be from kube-state-metrics (v2.2.4) from the prometheus-support configuration. Attempting to update to v2.9.2
The archive-* buckets are "Multi-region" buckets:
Unclear if this has a significant impact on costs if it is not explicitly in the cluster region.
Grafana must be restarted in each project to pickup the new datasources for the data-pipeline cluster.
The egress traffic from measurement-lab to sandbox/staging appears to have decreased significantly over the weekend after stopping the data-processing cluster in the us-east last week.
And the gardener & autoloader appear to be WAI in staging over the weekend also.
The data-processing cluster in mlab-sandbox & mlab-staging is in us-east, while the archive-measurement-lab bucket is in us-central1. These clusters should be redeployed to us-central, and their output buckets recreated in us-central. Since we want the GKE cluster to be managed by Terraform, we will recreate the production cluster as well.
Production deployment
Clean up tasks after deployments:
Consider