Open BrentDorsey opened 5 years ago
This issue is definitely related to the kms key ring and key. If you run everything in a new project it all works. If you try to delete and recreate the vault StatefulSet the vault fails to initialize raising a can't access core/migration error.
$ kubectl logs vault-0 -c vault
2018-12-03T12:57:31.902Z [DEBUG] storage.gcs: configuring backend
2018-12-03T12:57:31.902Z [DEBUG] storage.gcs: configuration: bucket=my-dev-vault chunk_size=8388608 ha_enabled=true max_parallel=0
2018-12-03T12:57:31.902Z [DEBUG] storage.gcs: creating client
2018-12-03T12:57:31.916Z [WARN] migration check error: error="failed to read value for "core/migration": Get https://storage.googleapis.com/my-dev-vault/core/migration: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: dial tcp 64.233.191.95:443: connect: connection refused"
WARNING! Unable to read migration status.
2018-12-03T12:57:33.927Z [WARN] migration check error: error="failed to read value for "core/migration": Get https://storage.googleapis.com/my-dev-vault/core/migration: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: EOF"
2018-12-03T12:57:35.941Z [WARN] migration check error: error="failed to read value for "core/migration": Get https://storage.googleapis.com/my-dev-vault/core/migration: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: read tcp 10.40.1.27:50718->64.233.191.95:443: read: connection reset by peer"
2018-12-03T12:57:37.952Z [WARN] migration check error: error="failed to read value for "core/migration": Get https://storage.googleapis.com/my-dev-vault/core/migration: oauth2: cannot fetch token: Post https://oauth2.googleapis.com/token: read tcp 10.40.1.27:44008->74.125.124.95:443: read: connection reset by peer"
Try deleting everything in the GCS bucket and deleting and re-deploying the statefulset
That's the first thing I tried :-), didn't work.
Stuck in the same place :(
I've still not been able to reproduce this (I run these scripts from scratch every week). Can someone share a reproduction?
Thanks for the fantastic work on this and document. I have the same issue though as BrentDorsey and vnbx.
Steps To Reproduce
On GCP, in already existing project (I cannot create a new project as I do not have permissions to do so in our subscription)
Cloned repo vault-on-gke and followed the steps therein to the T. I had the issue as below on deploying the statefultset.
The init container logs showed:`
kubectl logs vault-0 -c vault-init
2019/02/18 23:38:31 Starting the vault-init service...
2019/02/18 23:38:31 Head https://127.0.0.1:8200/v1/sys/health: dial tcp 127.0.0.1:8200: connect: connection refused
2019/02/18 23:38:37 Vault is sealed. Unsealing...
2019/02/18 23:38:37 storage: object doesn't exist
2019/02/18 23:38:37 Next check in 5s
2019/02/18 23:38:42 Vault is sealed. Unsealing...
2019/02/18 23:38:43 storage: object doesn't exist
2019/02/18 23:38:43 Next check in 5s
First round debug:
I deleted everything including the GCS bucket, the service account, the KMS key and recreated everything again from scratch (except that I could not recreate in a new project). I faced the same issue.
Second Round
Then I changed the vault-init image to version "1.0.0" and the Vault image to 1.0.2 in the manifest.
HTTP status code from vault-init was 501
kubectl logs vault-0 -c vault-init
2019/02/18 23:09:17 Starting the vault-init service...
2019/02/18 23:09:17 Head https://127.0.0.1:8200/v1/sys/health: dial tcp 127.0.0.1:8200: connect: connection refused
2019/02/18 23:09:22 Head https://127.0.0.1:8200/v1/sys/health: dial tcp 127.0.0.1:8200: connect: connection refused
2019/02/18 23:09:27 Vault is not initialized. Initializing and unsealing...
2019/02/18 23:09:39 Encrypting unseal keys and the root token...
2019/02/18 23:09:40 googleapi: Error 403: Permission 'cloudkms.cryptoKeyVersions.useToEncrypt' denied for resource 'projects/sap-se-ycloud-infra/locations/global/keyRings/vault/cryptoKeys/vault-init6'., forbidden
2019/02/18 23:09:40 storage: object doesn't exist
This was strange since my key had IAM policy "roles/editor" and "roles/cloudkms.cryptoKeyEncrypterDecrypter"
gcloud kms keys get-iam-policy vault-init6 --keyring vault6 --location global
bindings:
- members:
- serviceAccount:vault-server@my-proj.iam.gserviceaccount.com
role: roles/cloudkms.cryptoKeyEncrypterDecrypter
- members:
- serviceAccount:vault-server@my-proj.iam.gserviceaccount.com
role: roles/editor
etag: xxxxxxxx
Round Three
Cloned repo vault-kubernetes-workshop Followed steps in there
HTTP status code from vault-init was 503
019/02/18 23:38:31 Starting the vault-init service...
2019/02/18 23:38:31 Head https://127.0.0.1:8200/v1/sys/health: dial tcp 127.0.0.1:8200: connect: connection refused
2019/02/18 23:38:37 Vault is sealed. Unsealing...
2019/02/18 23:38:37 storage: object doesn't exist
2019/02/18 23:38:37 Next check in 5s
2019/02/18 23:38:42 Vault is sealed. Unsealing...
2019/02/18 23:38:43 storage: object doesn't exist
2019/02/18 23:38:43 Next check in 5s
I expanded the permission for both the key and keyring to "roles/owner" for the service account and yet get above error. And reverted this later.
I have followed the steps exactly as in the repos mentioned above in each trial. The only change made was trying to change the vault-init image tag and vault image tag in the last but one trial.
Could you kindly help resolve it? Thanks.
Hi @sethvargo I have followed step-by-step document many times at https://github.com/kelseyhightower/vault-on-google-kubernetes-engine but I am getting the same error as below. Can you please let me know which Object it is looking for?
kubectl logs vault-0 -c vault-init 2019/02/28 15:25:06 Starting the vault-init service... 2019/02/28 15:25:06 Get https://127.0.0.1:8200/v1/sys/health: dial tcp 127.0.0.1:8200: connect: connection refused 2019/02/28 15:25:16 Vault is not initialized. Initializing and unsealing... 2019/02/28 15:25:27 Encrypting unseal keys and the root token... 2019/02/28 15:25:28 storage: object doesn't exist 2019/02/28 15:25:28 Next check in 10s 2019/02/28 15:25:38 Vault is sealed. Unsealing... 2019/02/28 15:25:38 storage: object doesn't exist 2019/02/28 15:25:38 Next check in 10s 2019/02/28 15:25:48 Vault is sealed. Unsealing... 2019/02/28 15:25:49 storage: object doesn't exist 2019/02/28 15:25:49 Next check in 10s 2019/02/28 15:25:59 Vault is sealed. Unsealing... 2019/02/28 15:25:59 storage: object doesn't exist 2019/02/28 15:25:59 Next check in 10s 2019/02/28 15:26:09 Vault is sealed. Unsealing... 2019/02/28 15:26:10 storage: object doesn't exist 2019/02/28 15:26:10 Next check in 10s
can you post your .hcl file that you are constructing? you need to make sure that the bucket is defined there with the right permissions .
@priyeshgpatel the .hcl, I am using is below which is actually given as argument for the initcontainer as defined at https://github.com/kelseyhightower/vault-on-google-kubernetes-engine/blob/e7e24127b62b8f120ff24a0de8413263ca54b0e3/vault.yaml
listener "tcp" { address = "0.0.0.0:8200" tls_cert_file = "/etc/vault/tls/vault.pem" tls_key_file = "/etc/vault/tls/vault-key.pem" tls_min_version = "tls12" } storage "gcs" { bucket = "*-vault-storage" ha_enabled = "true" } ui = true
you need to add the seal stanza with the keyring and key
address = "0.0.0.0:8200"
tls_cert_file = "/etc/vault/tls/vault.pem"
tls_key_file = "/etc/vault/tls/vault-key.pem"
tls_min_version = "tls12"
}
storage "gcs" {
bucket = "<SOME GCP BUCKET>"
ha_enabled = "true"
}
seal "gcpckms" {
project = "<yourGCP PROJECT>"
region = "global"
key_ring = "<YOUR KEY RING"
crypto_key = "<KMSKEY>"
}
ui = true
@priyeshgpatel - adding that piece doesn't made any difference. It is still the same: kubectl logs -f vault-0 -c vault-init 2019/03/04 16:37:28 Starting the vault-init service... 2019/03/04 16:37:28 Get https://127.0.0.1:8200/v1/sys/health: dial tcp 127.0.0.1:8200: connect: connection refused 2019/03/04 16:37:38 Vault is sealed. Unsealing... 2019/03/04 16:37:39 storage: object doesn't exist 2019/03/04 16:37:39 Next check in 10s
Well you would need that stanza to create the seal keys. Other things i would try is: 1) empty the bucket 2) test the sa account to be able to encrypt/decrypt files and push files in the gcp bucket
Here is how to make it work:
This is some race condition where the initialization for some reason does not complete successfully and does not store the .enc files in the bucket.
this is extremely outdated and should be modified to use the new vault 1.1 with KMS built in
here's vault.yaml with the changes needed to make it work with the new version
notes:
apiVersion: v1
kind: Service
metadata:
name: vault
spec:
clusterIP: None
ports:
- name: http
port: 8200
- name: server
port: 8201
selector:
app: vault
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: vault
labels:
app: vault
spec:
serviceName: "vault"
selector:
matchLabels:
app: vault
replicas: 2
template:
metadata:
labels:
app: vault
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- vault
topologyKey: kubernetes.io/hostname
initContainers:
- name: config
image: busybox
env:
- name: GCS_BUCKET_NAME
valueFrom:
configMapKeyRef:
name: vault
key: gcs-bucket-name
command: ["/bin/sh", "-c"]
args:
- |
cat > /etc/vault/config/vault.hcl <<EOF
listener "tcp" {
address = "0.0.0.0:8200"
tls_cert_file = "/etc/vault/tls/vault.pem"
tls_key_file = "/etc/vault/tls/vault-key.pem"
tls_min_version = "tls12"
}
storage "gcs" {
bucket = "${GCS_BUCKET_NAME}"
ha_enabled = "true"
}
seal "gcpckms" {
project = "PROJECT-ID-HERE"
region = "global"
key_ring = "vault"
crypto_key = "vault-init"
}
ui = true
EOF
volumeMounts:
- name: vault-config
mountPath: /etc/vault/config
containers:
- name: vault
image: "vault:1.1.3"
env:
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: "status.podIP"
- name: "VAULT_API_ADDR"
valueFrom:
configMapKeyRef:
name: vault
key: api-addr
- name: "VAULT_CLUSTER_ADDR"
value: "https://$(POD_IP):8201"
args:
- "server"
- "-config=/etc/vault/config/vault.hcl"
ports:
- name: http
containerPort: 8200
protocol: "TCP"
- name: server
containerPort: 8201
protocol: "TCP"
readinessProbe:
httpGet:
path: "/v1/sys/health?standbyok=true"
port: 8200
scheme: HTTPS
initialDelaySeconds: 5
periodSeconds: 10
resources:
requests:
cpu: "500m"
memory: "1Gi"
securityContext:
capabilities:
add:
- IPC_LOCK
volumeMounts:
- name: vault-config
mountPath: /etc/vault/config
- name: vault-tls
mountPath: /etc/vault/tls
volumes:
- name: vault-config
emptyDir: {}
- name: vault-tls
secret:
secretName: vault
now when applied for the 1st time it will start sealed an uninitialized
you must port forward to vault-0 pod and initilize your self which will use KMS
kubectl port-forward vault-0 8200:8200
$ vault operator init
Recovery Key 1: DQPWFQjcZSjo04Jjvgosxwz7dPATlbAanY+qxoOAPey+
Recovery Key 2: 2GupUmF//LIIN7kxEJMaVfQkN4MSA8JUDVRr/f+3pyWP
Recovery Key 3: 38maDcchw+Qj8/tl9jWM+yjCGNFOUe4bnfr9Rsd1TkN+
Recovery Key 4: Tcjax6o9uoHyNwj2Er6ll9lq5nape2NZOHIn2Lxtf0ZS
Recovery Key 5: nfz3wqVqWtmLtK2LBhPRMwBQE/V0eP3Qo0ItLAgw0EQy
Initial Root Token: s.Spnah49tLX7DR7EJgyHEnd35
Success! Vault is initialized
Recovery key initialized with 5 key shares and a key threshold of 3. Please
securely distribute the key shares printed above.
You can use https://github.com/sethvargo/vault-on-gke for a more updated version.
Had the exact same issue and my problem were oauth scopes on cluster which I forgot to update to
oauth_scopes = [
"https://www.googleapis.com/auth/cloud-platform"
]
or at least give it a bucket storage write scope as by default it's set to
https://www.googleapis.com/auth/devstorage.read_only
Thanks for putting this together man I love your work!
I'm hoping you can help me resolve the issue I'm having. I've gone through the instructions several times and I keep running into the same "storage: object doesn't exists" error when init container is trying to unseal the vault.
The missing storage object is unseal-keys.json.enc.
For some reason the init container is not able to authenticate to the vault API and unable to generate unseal-keys.json.enc?
The only changes I made to the instructions were to use the us-central region and I had to remove cluster-version because 1.11.2-gk3.9 is no longer supported.