Open gfr10598 opened 4 years ago
The code for this was completed with #923. However, it also requires that the k8s node pool has adequate permissions on the target GCS bucket.
This is working properly in sandbox since mid June, but it is not clear what actions were taken to provide write access for the node-pool to json-mlab-sandbox. Surprisingly, the node-pool seems to have read only storage access.
Access scopes
PROBABLY not useful.
The service account is determined by a secret in the k8s config.
gcloud container clusters get-credentials data-processing --region us-east1 --project mlab-sandbox
kubectl get pod etl-parser-85dc68bd86-pcqzk -o yaml | grep secretName
> default-token-f4pmb
kubectl get secret default-token-f4pmb -o yaml | less
gcloud container clusters get-credentials data-processing --region us-east1 --project mlab-sandbox && kubectl get secret default-token-f4pmb -o yaml | less
Next: How does one find the SA and associated ACLs? The token name and the UID don't seem to appear on the SA console page.
metadata: annotations: kubernetes.io/service-account.name: default kubernetes.io/service-account.uid: babbb228-fb69-11e9-933c-42010a8e020c creationTimestamp: 2019-10-30T23:05:13Z name: default-token-f4pmb namespace: default resourceVersion: "280" selfLink: /api/v1/namespaces/default/secrets/default-token-f4pmb uid: bac39308-fb69-11e9-933c-42010a8e020c
Create new node-pools with custom service account.
gcloud --project=mlab-sandbox container node-pools create parser-pool-2 --cluster=data-processing \
--num-nodes=3 --region=us-east1 --scopes storage-rw,compute-rw,bigquery,datastore \
--node-labels=parser-node2=true --enable-autorepair --enable-autoupgrade \
--machine-type=n1-standard-8 --service-account=etl-k8s-parser@mlab-sandbox.iam.gserviceaccount.com
gcloud --project=mlab-staging container node-pools create parser-pool-2 --cluster=data-processing \
--num-nodes=3 --region=us-east1 --scopes storage-rw,compute-rw,bigquery,datastore \
--node-labels=parser-node2=true --enable-autorepair --enable-autoupgrade \
--machine-type=n1-standard-8 --service-account=etl-k8s-parser@mlab-staging.iam.gserviceaccount.com
Using the new parser-pool-2 (parser-node2), GKE is unable to fetch the container from gcr.io.
Messing about with the etl-k8s-parser SA, by using local machine with gcloud auth and gcloud docker
gcloud auth configure-docker
gcloud auth activate-service-account etl-k8s-parser@mlab-sandbox.iam.gserviceaccount.com --key-file /Users/gfr/Downloads/mlab-sandbox-fde10b933796.json
docker pull gcr.io/mlab-sandbox/github.com/m-lab/etl:8f0f7ae9e3ec9e51c11c146a82c1601672521d9d
docker pull gcr.io/mlab-sandbox/github.com/m-lab/etl:8f0f7ae9e3ec9e51c11c146a82c1601672521d9d
8f0f7ae9e3ec9e51c11c146a82c1601672521d9d: Pulling from mlab-sandbox/github.com/m-lab/etl
Digest: sha256:0dc53a3fe84b546b347f8a372b37f64e18100ca2b93e2e890c8870938cc11061
Status: Image is up to date for gcr.io/mlab-sandbox/github.com/m-lab/etl:8f0f7ae9e3ec9e51c11c146a82c1601672521d9d
gcr.io/mlab-sandbox/github.com/m-lab/etl:8f0f7ae9e3ec9e51c11c146a82c1601672521d9d
This works. Also works with default compute engine SA, and with gfr@
Streaming inserts are expensive and have assorted other headaches. We should move to BQ Load of JSON data. The first step in this is allowing parsers to export to JSONL files in GCS instead of to BQ inserts.