pangeo-forge / pangeo-forge-gcs-bakery

A repo for building out a pangeo forge bakery in Google Cloud Platform
1 stars 0 forks source link

Columbia GKE VMs can't have external IPs. Use NAT instead? #29

Closed cisaacstern closed 2 years ago

cisaacstern commented 2 years ago

In working on #19, I ran into the following error upon running make deploy (pretty sure it's triggered by this line):

│ Error: Error waiting for creating GKE cluster: 
│       (1) Not all instances running in IGM after 27.91542561s. Expected 1, running 0, transitioning 1. Current errors: [CONDITION_NOT_MET]: Instance 'gke-pfcsb-cluster-default-pool-83972429-mhbp' creation failed: Constraint constraints/compute.vmExternalIpAccess violated for project 13667658525. Add instance projects/pangeo-forge-4967/zones/us-central1-f/instances/gke-pfcsb-cluster-default-pool-83972429-mhbp to the constraint to use external IP with it
│       (2) Not all instances running in IGM after 29.8823857s. Expected 1, running 0, transitioning 1. Current errors: [CONDITION_NOT_MET]: Instance 'gke-pfcsb-cluster-default-pool-5f0b4f10-v4xk' creation failed: Constraint constraints/compute.vmExternalIpAccess violated for project 13667658525. Add instance projects/pangeo-forge-4967/zones/us-central1-a/instances/gke-pfcsb-cluster-default-pool-5f0b4f10-v4xk to the constraint to use external IP with it
│       (3) Not all instances running in IGM after 33.618018184s. Expected 1, running 0, transitioning 1. Current errors: [CONDITION_NOT_MET]: Instance 'gke-pfcsb-cluster-default-pool-ce7e23ec-87w8' creation failed: Constraint constraints/compute.vmExternalIpAccess violated for project 13667658525. Add instance projects/pangeo-forge-4967/zones/us-central1-b/instances/gke-pfcsb-cluster-default-pool-ce7e23ec-87w8 to the constraint to use external IP with it.
│
│   with google_container_cluster.primary,
│   on cluster.tf line 9, in resource "google_container_cluster" "primary":
│    9: resource "google_container_cluster" "primary" {

I'll confess that I initially thought (and my initially I mean for the past 3 hours or so 🤦) that this had something to do with my laptop's IP address, and tried various ill-conceived workarounds for that, including running make deploy from GCP's in-browser shell; using GCP's graphical in-browser GKE cluster creation tool; and re-trying locally with Columbia's VPN enabled. All methods produced the same error.

I now see (thanks, reddit) that this is a matter of the networking config for the VMs themselves, not the IP address from which their creation is requested. I have checked with @rabernat, who reports that changing constraints/compute.vmExternalIpAccess settings will be a non-starter for Columbia. Therefore, I believe we'll need an alternative wherein VMs do not use external IPs.

@tracetechnical @sharkinsspatial am I correct in assuming that network address translation (NAT) is the way forward with this? If so, what are next steps for re-configuring our terraform to use this approach? If not, what other options do we have?

rabernat commented 2 years ago

This is a problem that 2i2c had to deal with when setting up a JupyterHub for a different project. It was a real pain to work around. Pinging @yuvipanda and @sgibson91 for any tips they might have. Maybe there is an easy solution.

What they will probably tell us is that we should try to liberate our project from Columbia's built in constraints. I will try to escalate the issue with the university to see if they will shut off these constraints. But I am not optimistic.

This really highlights the challenges of trying to build cloud stuff quickly under the umbrella of the university.

sgibson91 commented 2 years ago

This is the PR I made to enable private nodes and use cloud NAT for Pangeo's JupyterHub https://github.com/2i2c-org/infrastructure/pull/538 Hope its helpful!

cisaacstern commented 2 years ago

@sgibson91 thank you so much for sharing this. This looks like exactly what we need. Trying it out now and will report back with the results! 🙌 Open source collaboration ftw