Closed seh closed 2 years ago
It turns out that the KeysetItem.Certificate
field is nil in all but the last two items in my key set. I added some output to (*OIDCKeys).Open
. It reports the following:
Number of keys in key set: 7
Key set item "6702426753028327577194087677": &{6702426753028327577194087677 <nil> <nil> 0xc001200b50}
(ID: "6702426753028327577194087677", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd92c0})
Key set item "6717351783746805535929340772": &{6717351783746805535929340772 <nil> <nil> 0xc001200b90}
(ID: "6717351783746805535929340772", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd9440})
Key set item "6724755564554290971271764485": &{6724755564554290971271764485 <nil> <nil> 0xc001200bd0}
(ID: "6724755564554290971271764485", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd9500})
Key set item "6725145319226802661715703465": &{6725145319226802661715703465 <nil> <nil> 0xc001200c10}
(ID: "6725145319226802661715703465", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc000fd95c0})
Key set item "6727272810098431180443208693": &{6727272810098431180443208693 <nil> <nil> 0xc001200c50}
(ID: "6727272810098431180443208693", distrust timestamp <nil>, certificate: <nil>, private key: &{0xc001508180})
Key set item "6727329898571771312485446625": &{6727329898571771312485446625 <nil> 0xc000afc000 0xc001200d00}
(ID: "6727329898571771312485446625", distrust timestamp <nil>, certificate: &{CN=kubernetes-master false 0xc00206c580 0xc001200cb0}, private key: &{0xc0015084e0})
Key set item "6906097667750333366645304518": &{6906097667750333366645304518 <nil> 0xc000afc120 0xc001200e90}
(ID: "6906097667750333366645304518", distrust timestamp <nil>, certificate: &{CN=service-account true 0xc00206cb00 0xc001200e20}, private key: &{0xc001508660})
If I add the following guard condition to (*OIDCKeys).Open
, it looks like it will filter the key set items down to just those that contain a certificate for the common name "service-account":
if item.Certificate == nil || item.Certificate.Subject.CommonName != "service-account" {
continue
}
Does that preserve all the items that this method was expecting to consume?
Note that the kops get keypairs subcommand fails similarly, due to assuming that every key set item contains an X.509 certificate.
% kops get keypairs
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x100 pc=0x42a9788]
goroutine 1 [running]:
main.listKeypairs({0x5c80b38?, 0xc00066d620?}, {0x7dfba58, 0x0, 0x25?}, 0x0)
k8s.io/kops/cmd/kops/get_keypairs.go:127 +0x2e8
main.RunGetKeypairs({0x5c7d260, 0xc0000520e8}, {0x5c61a80?, 0xc000c09080?}, {0x5c639c0?, 0xc00000e018?}, 0xc0008b6270)
k8s.io/kops/cmd/kops/get_keypairs.go:174 +0xf8
main.NewCmdGetKeypairs.func3(0xc000e65680?, {0x7dfba58?, 0x0?, 0x0?})
k8s.io/kops/cmd/kops/get_keypairs.go:78 +0x3e
github.com/spf13/cobra.(*Command).execute(0xc000e65680, {0x7dfba58, 0x0, 0x0})
github.com/spf13/cobra@v1.5.0/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0x7da7c00)
github.com/spf13/cobra@v1.5.0/command.go:990 +0x3b4
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.5.0/command.go:918
main.Execute()
k8s.io/kops/cmd/kops/root.go:95 +0x5c
main.main()
k8s.io/kops/cmd/kops/main.go:20 +0x17
@seh Would you like to continue to iterate on the fix?
Would you like to continue to iterate on the fix?
Yes, though it would help to hear whether or not these entries that lack certificates are valid. Can kOps use them for anything? Should I ignore them as if they were distrusted?
Ignore them as distrusted, but list them and make them deletable, I would say.
I guess I didn't research far back enough in the history of the keystore code.
kOps can't use a private key without a certificate for anything unless/until it generates a corresponding certificate. (Though for service-account keypairs the only part of the certificate it uses is the public key.)
These days all code paths that create a key also create a corresponding certificate. I would agree that keys without certificates should be ignored as if distrusted.
As this is not a regression or something that breaks things for a lot of users, I removed the blocks-next label.
1. What
kops
version are you running?Client version: 1.24.1 (git-v1.24.1)
2. What Kubernetes version are you running?
Starting with version 1.19.9, upgrading to version 1.21.14.
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
5. What happened after the commands executed?
It appears that kops update cluster fails when it panics preparing for publishing OIDC Discovery documents to an S3 bucket:
It appears to be failing on this line in file pkg/model/issuerdiscovery.go, trying to access a public key in memory.
6. What did you expect to happen?
kops update cluster would publish all the OIDC Discovery documents to S3, and continue on with the rest of its tasks.
7. Please provide your cluster manifest.
cluster.yaml file
```yaml apiVersion: kops.k8s.io/v1alpha2 kind: Cluster metadata: name: my-cluster.example.com spec: additionalSans: - api.internal.my-cluster.example.com api: loadBalancer: additionalSecurityGroups: - sg-005e2b9c6ffed8582 class: Network crossZoneLoadBalancing: true type: Public authorization: rbac: {} certManager: enabled: true managed: false cloudConfig: disableSecurityGroupIngress: true cloudProvider: aws clusterAutoscaler: balanceSimilarNodeGroups: true enabled: true configBase: s3://my-kops-state/my-cluster.example.com etcdClusters: - etcdMembers: - instanceGroup: master-us-east-2a name: a - instanceGroup: master-us-east-2b name: b - instanceGroup: master-us-east-2c name: c manager: env: - name: ETCD_LISTEN_METRICS_URLS value: http://0.0.0.0:8081 - name: ETCD_METRICS value: extensive name: main - etcdMembers: - instanceGroup: master-us-east-2a name: a - instanceGroup: master-us-east-2b name: b - instanceGroup: master-us-east-2c name: c manager: env: - name: ETCD_LISTEN_METRICS_URLS value: http://0.0.0.0:8082 - name: ETCD_METRICS value: basic name: events iam: allowContainerRegistry: true legacy: false kubeAPIServer: featureGates: EphemeralContainers: "true" kubeDNS: provider: KubeDNS kubeProxy: enabled: false kubelet: anonymousAuth: false featureGates: EphemeralContainers: "true" kubeReserved: cpu: 750m memory: .75Gi kubernetesVersion: 1.21.14 metricsServer: enabled: true networkCIDR: 10.3.0.0/16 networkID: vpc-087cd3eb3bf613986 networking: calico: bpfEnabled: true crossSubnet: true encapsulationMode: vxlan typhaReplicas: 3 nonMasqueradeCIDR: 100.64.0.0/10 serviceAccountIssuerDiscovery: discoveryStore: s3://my-kops-oidc-discovery/my-cluster enableAWSOIDCProvider: true sshAccess: - 184.74.210.37/32 - 184.74.210.38/32 - 207.141.66.101/32 - 207.141.66.99/32 - 212.187.232.28/32 - 212.187.232.29/32 - 4.53.131.109/32 - 4.53.131.110/32 - 4.71.99.125/32 - 4.71.99.126/32 subnets: - cidr: 10.3.100.0/22 id: subnet-0cd20dfb64345dede name: utility-us-east-2a type: Utility zone: us-east-2a - cidr: 10.3.104.0/22 id: subnet-0657e2c2163960a79 name: utility-us-east-2b type: Utility zone: us-east-2b - cidr: 10.3.108.0/22 id: subnet-013e44ade2633a1b1 name: utility-us-east-2c type: Utility zone: us-east-2c - cidr: 10.3.0.0/22 egress: nat-06a85bf97c4a5b65d id: subnet-0ca2f5a3ab50e538e name: us-east-2a type: Private zone: us-east-2a - cidr: 10.3.4.0/22 egress: nat-054d637847b63ea36 id: subnet-047a72902591ebe60 name: us-east-2b type: Private zone: us-east-2b - cidr: 10.3.8.0/22 egress: nat-0df765ca07bb44f0f id: subnet-051d2325bcab67fa6 name: us-east-2c type: Private zone: us-east-2c topology: dns: type: Public masters: private nodes: private updatePolicy: external ```8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.Here is the kops update cluster output at verbosity level ten, just before the failure:
Earlier, I see this pertinent log message:
Note that at present, the aforementioned S3 bucket exists, but there is no existing object with the path my-cluster/openid/v1/jwks.
9. Anything else do we need to know?
I have been able to upgrade clusters and activate the "spec.serviceAccountIssuerDiscovery.enableAWSOIDCProvider" field's behavior successfully with earlier versions of kOps, which wrote the S3 object as necessary. This version of kOps appears to be failing before it can create this S3 object. kOps was able to create the my-cluster/.well-known/openid-configuration object in the same bucket.
See #13353 for what looks to be an earlier report of a similar defect.
See the prior discussion in the "kops-users" channel of the "Kubernetes" Slack workspace.
/kind bug