Closed kciredor closed 1 year ago
@olemarkus @zetaab Any ideas?
@kciredor could you please provide the exact command you ran to reproduce?
Looks specific to OpenStack, as that is the only cloud provider other than AWS that calls into the cloud to get the default instance type.
Here's the full command I used @johngmyers:
kops create cluster --cloud openstack --name test.k8s.local --state swift://test --master-zones zone1 --zones zone1 --network-cidr 10.0.0.0/16 --image debian11 --master-count=3 --node-count=3 --master-size VDC-4 --node-size VDC-8 --etcd-storage-type replicated --topology private --bastion --ssh-public-key ~/ssh_id_rsa.pub --ssh-access=1.2.3.4/32 --api-loadbalancer-type public --admin-access=1.2.3.4/32 --networking calico --os-dns-servers=1.1.1.1,1.0.0.1 --os-ext-net=vlan1 --os-octavia=true
Besides the bug I guess it would make sense to be able to specify the bastion instance type from the commandline. I can specify master and nodes already.
I added a --bastion-image
flag in #14535. For the type I'm not sure why one wouldn't want the autodetected, smallest type that is appropriate for the image.
Right, makes sense @johngmyers, I generally like to "pin" things so that would be my motivation for setting the bastion instance type from the cli.
What remains is that for now I can't deploy Kops 1.25 on OpenStack. Perhaps @olemarkus or @zetaab can assist?
Let me have a look at how easy of a fix this is. I unfortunately don't have access to an openstack environment anymore, but hopefully this can be reproduced with integration tests.
If you want I can test patch(es) on our OS, let me know please if I can help.
You can test this one: https://github.com/kubernetes/kops/pull/14630
I was not able to build your branch @olemarkus, so instead applied your patch to the v1.25.2 branch, which builds fine.
Now it proceeds beyond the previous panic but then panics again:
I1123 10:51:09.257134 91 create_cluster.go:831] Using SSH public key: /id_rsa.pub
W1123 10:51:10.419545 91 new_cluster.go:894] Running with masters in the same AZs; redundancy will be reduced
I1123 10:51:10.419627 91 new_cluster.go:1286] Cloud Provider ID = openstack
I1123 10:51:10.584959 91 subnets.go:185] Assigned CIDR 10.0.32.0/19 to subnet amsterdam1
I1123 10:51:10.585006 91 subnets.go:199] Assigned CIDR 10.0.0.0/22 to subnet utility-amsterdam1
Previewing changes that will be made:
W1123 10:51:22.428062 91 pruning.go:112] manifest includes an object of GroupKind CSIDriver.storage.k8s.io, which will not be pruned
W1123 10:51:22.428086 91 pruning.go:112] manifest includes an object of GroupKind StorageClass.storage.k8s.io, which will not be pruned
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x3571714]
goroutine 1 [running]:
github.com/gophercloud/gophercloud/openstack/loadbalancer/v2/apiversions.listURL(0x4a973c?)
github.com/gophercloud/gophercloud@v1.0.0/openstack/loadbalancer/v2/apiversions/urls.go:11 +0x14
github.com/gophercloud/gophercloud/openstack/loadbalancer/v2/apiversions.List(0x0)
github.com/gophercloud/gophercloud@v1.0.0/openstack/loadbalancer/v2/apiversions/requests.go:10 +0x45
k8s.io/kops/upup/pkg/fi/cloudup/openstack.useLoadBalancerVIPACL({0x5424e90?, 0xc000a04120?})
k8s.io/kops/upup/pkg/fi/cloudup/openstack/cloud.go:658 +0x4a
k8s.io/kops/upup/pkg/fi/cloudup/openstack.(*openstackCloud).UseLoadBalancerVIPACL(0xc000a04120)
k8s.io/kops/upup/pkg/fi/cloudup/openstack/cloud.go:649 +0x4c
k8s.io/kops/pkg/model/openstackmodel.(*OpenstackModelContext).UseVIPACL(0x3c87000?)
k8s.io/kops/pkg/model/openstackmodel/context.go:52 +0x2a
k8s.io/kops/pkg/model/openstackmodel.(*FirewallModelBuilder).Build(0xc000534960, 0x10?)
k8s.io/kops/pkg/model/openstackmodel/firewall.go:618 +0x1dc
k8s.io/kops/upup/pkg/fi/cloudup.(*Loader).BuildTasks(0xc000d5b4b8, 0xc0006ad6b0)
k8s.io/kops/upup/pkg/fi/cloudup/loader.go:45 +0xf6
k8s.io/kops/upup/pkg/fi/cloudup.(*ApplyClusterCmd).Run(0xc000d5b938, {0x53f7080, 0xc0000560c0})
k8s.io/kops/upup/pkg/fi/cloudup/apply_cluster.go:679 +0x5712
main.RunUpdateCluster({0x53f7080, 0xc0000560c0}, 0xc0000f2c60, {0x53ceba0, 0xc000012018}, 0xc00058cfd0)
k8s.io/kops/cmd/kops/update_cluster.go:296 +0xbfd
main.RunCreateCluster({0x53f7080, 0xc0000560c0}, 0x14?, {0x53ceba0, 0xc000012018}, 0xc0003c1200)
k8s.io/kops/cmd/kops/create_cluster.go:763 +0x1488
main.NewCmdCreateCluster.func1(0xc000abb680?, {0xc0001ded80?, 0x24?, 0x24?})
k8s.io/kops/cmd/kops/create_cluster.go:203 +0x177
github.com/spf13/cobra.(*Command).execute(0xc000abb680, {0xc0001deb40, 0x24, 0x24})
github.com/spf13/cobra@v1.5.0/command.go:872 +0x694
github.com/spf13/cobra.(*Command).ExecuteC(0x7692060)
github.com/spf13/cobra@v1.5.0/command.go:990 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.5.0/command.go:918
main.Execute()
k8s.io/kops/cmd/kops/root.go:95 +0x5c
main.main()
k8s.io/kops/cmd/kops/main.go:20 +0x17
I've also tried your patch on the master branch but unable to build, same as your branch:
$ make kops
mkdir -p /tmp/kops/.build/dist/linux/amd64
GOOS=linux GOARCH=amd64 go build "-trimpath" -o /tmp/kops/.build/dist/linux/amd64/kops -ldflags=all=" -X k8s.io/kops.Version=1.26.0-alpha.1 -X k8s.io/kops.GitVersion=v1.26.0-alpha.1-460-g46d33317a4" k8s.io/kops/cmd/kops
# k8s.io/kops/upup/pkg/fi/cloudup
upup/pkg/fi/cloudup/new_cluster.go:237:21: undefined: fi.Bool
upup/pkg/fi/cloudup/new_cluster.go:243:15: cluster.Spec.KubernetesAPIAccess undefined (type "k8s.io/kops/pkg/apis/kops".ClusterSpec has no field or method KubernetesAPIAccess)
upup/pkg/fi/cloudup/new_cluster.go:302:25: undefined: fi.String
upup/pkg/fi/cloudup/new_cluster.go:305:18: undefined: fi.String
upup/pkg/fi/cloudup/new_cluster.go:306:18: undefined: fi.Bool
upup/pkg/fi/cloudup/new_cluster.go:309:20: undefined: fi.String
upup/pkg/fi/cloudup/new_cluster.go:310:20: undefined: fi.String
upup/pkg/fi/cloudup/new_cluster.go:311:20: undefined: fi.Int
upup/pkg/fi/cloudup/new_cluster.go:335:63: undefined: fi.Bool
upup/pkg/fi/cloudup/new_cluster.go:418:8: g.IsMaster undefined (type *"k8s.io/kops/pkg/apis/kops".InstanceGroup has no field or method IsMaster)
upup/pkg/fi/cloudup/new_cluster.go:418:8: too many errors
make: *** [Makefile:189: crossbuild-kops-linux-amd64] Error 2
/reopen
@olemarkus: Reopened this issue.
This one is a bit more tricky to solve with tests, I think ... @zetaab have you encountered this?
We are not using this part of the code at all. We create all clusters using yaml spec, not from cli parameters
That last error seems to be from a task though ... @kciredor what is the command you are using to produce this error?
Here's the full command I used @johngmyers:
kops create cluster --cloud openstack --name test.k8s.local --state swift://test --master-zones zone1 --zones zone1 --network-cidr 10.0.0.0/16 --image debian11 --master-count=3 --node-count=3 --master-size VDC-4 --node-size VDC-8 --etcd-storage-type replicated --topology private --bastion --ssh-public-key ~/ssh_id_rsa.pub --ssh-access=1.2.3.4/32 --api-loadbalancer-type public --admin-access=1.2.3.4/32 --networking calico --os-dns-servers=1.1.1.1,1.0.0.1 --os-ext-net=vlan1 --os-octavia=true
Same command as before @olemarkus
The stack trace suggests that no lbclient
is set in the OpenstackCloud
. That OpenStack load balancer logic is pretty convoluted.
@olemarkus do you think @johngmyers latest comment could be of help with the latest issue?
Yeah it most certainly is.
I looked through the code and it's quite convoluted as @johngmyers said. I refactored a bit in #14742 so you may try that one. There was one case that would result in a nil
load balancer, but unfortunately that is not one that you should be hitting ...
Still can you give that patch a try? If that one doesn't work, can you paste your spec.cloudConfig.openStack
?
Thanks @olemarkus. I was able to build your commit with the change and this resulted in:
I1208 14:18:38.009861 61 create_cluster.go:863] Using SSH public key: /id_rsa.pub
W1208 14:18:39.608318 61 new_cluster.go:935] Running with control-plane nodes in the same AZs; redundancy will be reduced
I1208 14:18:39.608358 61 new_cluster.go:1353] Cloud Provider ID = openstack
I1208 14:18:39.801594 61 subnets.go:185] Assigned CIDR 10.0.32.0/19 to subnet nova
I1208 14:18:39.801632 61 subnets.go:199] Assigned CIDR 10.0.0.0/22 to subnet utility-nova
Previewing changes that will be made:
W1208 14:18:51.665326 61 pruning.go:112] manifest includes an object of GroupKind CSIDriver.storage.k8s.io, which will not be pruned
W1208 14:18:51.665344 61 pruning.go:112] manifest includes an object of GroupKind StorageClass.storage.k8s.io, which will not be pruned
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x8 pc=0x37064d4]
goroutine 1 [running]:
github.com/gophercloud/gophercloud/openstack/loadbalancer/v2/apiversions.listURL(0x4a973c?)
github.com/gophercloud/gophercloud@v1.1.0/openstack/loadbalancer/v2/apiversions/urls.go:11 +0x14
github.com/gophercloud/gophercloud/openstack/loadbalancer/v2/apiversions.List(0x0)
github.com/gophercloud/gophercloud@v1.1.0/openstack/loadbalancer/v2/apiversions/requests.go:10 +0x45
k8s.io/kops/upup/pkg/fi/cloudup/openstack.useLoadBalancerVIPACL({0x56edb70?, 0xc0006141b0?})
k8s.io/kops/upup/pkg/fi/cloudup/openstack/cloud.go:675 +0x4a
k8s.io/kops/upup/pkg/fi/cloudup/openstack.(*openstackCloud).UseLoadBalancerVIPACL(0xc0006141b0)
k8s.io/kops/upup/pkg/fi/cloudup/openstack/cloud.go:666 +0x4c
k8s.io/kops/pkg/model/openstackmodel.(*OpenstackModelContext).UseVIPACL(0x419cfc0?)
k8s.io/kops/pkg/model/openstackmodel/context.go:52 +0x2a
k8s.io/kops/pkg/model/openstackmodel.(*FirewallModelBuilder).Build(0xc0001b42e0, 0x10?)
k8s.io/kops/pkg/model/openstackmodel/firewall.go:615 +0x1d3
k8s.io/kops/upup/pkg/fi/cloudup.(*Loader).BuildTasks(0xc0007f5450, 0xc0008cf740)
k8s.io/kops/upup/pkg/fi/cloudup/loader.go:45 +0xf6
k8s.io/kops/upup/pkg/fi/cloudup.(*ApplyClusterCmd).Run(0xc0007f58a8, {0x56bf330, 0xc0000560c0})
k8s.io/kops/upup/pkg/fi/cloudup/apply_cluster.go:702 +0x5b52
main.RunUpdateCluster({0x56bf330, 0xc0000560c0}, 0xc00026bce0, {0x5694c60, 0xc000180008}, 0xc000764f20)
k8s.io/kops/cmd/kops/update_cluster.go:293 +0xb93
main.RunCreateCluster({0x56bf330, 0xc0000560c0}, 0x14?, {0x5694c60, 0xc000180008}, 0xc00081af00)
k8s.io/kops/cmd/kops/create_cluster.go:795 +0x1308
main.NewCmdCreateCluster.func1(0xc000351800?, {0xc0003e0480?, 0x24?, 0x24?})
k8s.io/kops/cmd/kops/create_cluster.go:203 +0x177
github.com/spf13/cobra.(*Command).execute(0xc000351800, {0xc0004e9d40, 0x24, 0x24})
github.com/spf13/cobra@v1.6.1/command.go:916 +0x862
github.com/spf13/cobra.(*Command).ExecuteC(0x7ad2340)
github.com/spf13/cobra@v1.6.1/command.go:1044 +0x3bd
github.com/spf13/cobra.(*Command).Execute(...)
github.com/spf13/cobra@v1.6.1/command.go:968
main.Execute()
k8s.io/kops/cmd/kops/root.go:95 +0x5c
main.main()
k8s.io/kops/cmd/kops/main.go:20 +0x17
My spec.cloudConfig.openStack:
cloudConfig:
openstack:
blockStorage:
bs-version: v3
ignore-volume-az: false
loadbalancer:
floatingNetwork: vlan100
method: ROUND_ROBIN
provider: octavia
useOctavia: true
monitor:
delay: 15s
maxRetries: 3
timeout: 10s
router:
dnsServers: 1.1.1.1,1.0.0.1
externalNetwork: vlan100
Thanks. Doesn't seem like it helped much ... but at least the code is cleaner :P
https://github.com/kubernetes/kops/pull/14744 should probably fix things.
I've tried #14744 by building your fork branch @olemarkus and running it on two OpenStack environments. Output:
I1212 08:22:57.559334 56 create_cluster.go:863] Using SSH public key: /opt/haven/state/ssh_id_rsa.pub
W1212 08:23:00.038015 56 new_cluster.go:936] Running with control-plane nodes in the same AZs; redundancy will be reduced
I1212 08:23:00.038066 56 new_cluster.go:1354] Cloud Provider ID = openstack
I1212 08:23:00.198917 56 subnets.go:185] Assigned CIDR 10.0.32.0/19 to subnet nova
I1212 08:23:00.198946 56 subnets.go:199] Assigned CIDR 10.0.0.0/22 to subnet utility-nova
Previewing changes that will be made:
W1212 08:23:15.866871 56 pruning.go:112] manifest includes an object of GroupKind CSIDriver.storage.k8s.io, which will not be pruned
W1212 08:23:15.866902 56 pruning.go:112] manifest includes an object of GroupKind StorageClass.storage.k8s.io, which will not be pruned
F1212 08:23:19.415322 56 task.go:146] task *openstacktasks.SecurityGroup (*openstacktasks.SecurityGroup {"ID":null,"Name":"","Description":null,"RemoveExtraRules":["port=443"],"RemoveGroup":true,"Lifecycle":"Sync"}) did not have a Name
user@host $
So it seems to go through the lb part, but ends up with a different problem now.
Progress on some level at least :)
There is some weird logic around the use of private topology, gossip, and public load balancer.
It may be you get unstuck if you pass --api-public-name
. With your setup, I assume it's a static IP address that is used for reaching the k8s API?
One more step: https://github.com/kubernetes/kops/pull/14806
Thanks @olemarkus.
With kOps on OpenStack after creating a cluster with a working kOps version I end up with a loadbalancer in front of the k8s api having an external ip address taken from the pool, which is firewalled by a security group for my given cidr. So I don't know the api external ip until after the cluster has been created.
Output using #14806:
I1219 10:40:05.852505 68 create_cluster.go:863] Using SSH public key: /id_rsa.pub
W1219 10:40:07.022188 68 new_cluster.go:927] Running with control-plane nodes in the same AZs; redundancy will be reduced
I1219 10:40:07.022255 68 new_cluster.go:1345] Cloud Provider ID: "openstack"
I1219 10:40:07.194165 68 subnets.go:185] Assigned CIDR 10.0.32.0/19 to subnet amsterdam1
I1219 10:40:07.194192 68 subnets.go:199] Assigned CIDR 10.0.0.0/22 to subnet utility-amsterdam1
Previewing changes that will be made:
W1219 10:40:20.058419 68 pruning.go:112] manifest includes an object of GroupKind CSIDriver.storage.k8s.io, which will not be pruned
W1219 10:40:20.058447 68 pruning.go:112] manifest includes an object of GroupKind StorageClass.storage.k8s.io, which will not be pruned
I1219 10:40:21.099153 68 executor.go:111] Tasks: 0 done / 146 total; 56 can run
W1219 10:40:21.246204 68 vfs_castore.go:382] CA private key was not found
I1219 10:40:21.575425 68 executor.go:111] Tasks: 56 done / 146 total; 57 can run
I1219 10:40:21.632227 68 executor.go:111] Tasks: 113 done / 146 total; 10 can run
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x37749ae]
goroutine 701 [running]:
github.com/gophercloud/gophercloud.(*ServiceClient).ResourceBaseURL(...)
github.com/gophercloud/gophercloud@v1.1.1/service_client.go:39
github.com/gophercloud/gophercloud.(*ServiceClient).ServiceURL(...)
github.com/gophercloud/gophercloud@v1.1.1/service_client.go:47
github.com/gophercloud/gophercloud/openstack/loadbalancer/v2/loadbalancers.rootURL(0x1c00073eb10?)
github.com/gophercloud/gophercloud@v1.1.1/openstack/loadbalancer/v2/loadbalancers/urls.go:14 +0x4e
github.com/gophercloud/gophercloud/openstack/loadbalancer/v2/loadbalancers.List(0x0, {0x57426c0, 0xc000a7e000})
github.com/gophercloud/gophercloud@v1.1.1/openstack/loadbalancer/v2/loadbalancers/requests.go:59 +0x65
k8s.io/kops/upup/pkg/fi/cloudup/openstacktasks.(*LB).Find(0xc0000b3d60, 0x2?)
k8s.io/kops/upup/pkg/fi/cloudup/openstacktasks/lb.go:153 +0xeb
reflect.Value.call({0x492c560?, 0xc0000b3d60?, 0x2?}, {0x4d51553, 0x4}, {0xc000d9cee8, 0x1, 0x578c9f0?})
reflect/value.go:584 +0x8c5
reflect.Value.Call({0x492c560?, 0xc0000b3d60?, 0x4d8d4c6?}, {0xc000d9cee8?, 0xc00162e0c0?, 0xc000ad6c00?})
reflect/value.go:368 +0xbc
k8s.io/kops/util/pkg/reflectutils.InvokeMethod({0x492c560, 0xc0000b3d60?}, {0x4d515eb, 0x4}, {0xc00196dde0, 0x1, 0x460785?})
k8s.io/kops/util/pkg/reflectutils/walk.go:78 +0x3c5
k8s.io/kops/upup/pkg/fi.invokeFind({0x5740300?, 0xc0000b3d60?}, 0xc0000b3d60?)
k8s.io/kops/upup/pkg/fi/default_methods.go:126 +0x85
k8s.io/kops/upup/pkg/fi.DefaultDeltaRunMethod({0x5740300?, 0xc0000b3d60}, 0xc000f04840)
k8s.io/kops/upup/pkg/fi/default_methods.go:50 +0x135
k8s.io/kops/upup/pkg/fi/cloudup/openstacktasks.(*LB).Run(0xc0009f1d28?, 0x5740300?)
k8s.io/kops/upup/pkg/fi/cloudup/openstacktasks/lb.go:174 +0x26
k8s.io/kops/upup/pkg/fi.(*executor).forkJoin.func1(0xc0008f2cb0, 0x5)
k8s.io/kops/upup/pkg/fi/executor.go:195 +0x270
created by k8s.io/kops/upup/pkg/fi.(*executor).forkJoin
k8s.io/kops/upup/pkg/fi/executor.go:183 +0x86
The Kubernetes project currently lacks enough contributors to adequately respond to all issues.
This bot triages un-triaged issues according to the following rules:
lifecycle/stale
is appliedlifecycle/stale
was applied, lifecycle/rotten
is appliedlifecycle/rotten
was applied, the issue is closedYou can:
/remove-lifecycle stale
/close
Please send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale
@olemarkus this is still a thing actually
I believe this has been fixed, yes.
/close
@olemarkus: Closing this issue.
/kind bug
1. What
kops
version are you running? The commandkops version
, will display this information. v1.25.22. What Kubernetes version are you running?
kubectl version
will print the version if a cluster is running or provide the Kubernetes version specified as akops
flag. v1.25.x3. What cloud provider are you using? OpenStack
4. What commands did you run? What is the simplest way to reproduce this issue?
kops create cluster
according to https://kops.sigs.k8s.io/getting_started/openstack/5. What happened after the commands executed? Panic, stack trace:
6. What did you expect to happen? Creation of a cluster without problems.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest. You may want to remove your cluster name and other sensitive information.8. Please run the commands with most verbose logging by adding the
-v 10
flag. Paste the logs into this report, or in a gist and provide the gist link here.9. Anything else do we need to know? No problems with kOps 1.24.x, this started with 1.25.x. I've tried to find out where the issue occurs in the code and traced it to https://github.com/kubernetes/kops/blob/v1.25.2/upup/pkg/fi/cloudup/new_cluster.go#L423 which has a reference to cloud which is nil because it's declared here https://github.com/kubernetes/kops/blob/v1.25.2/upup/pkg/fi/cloudup/new_cluster.go#L276 but not assigned any value. The only cloud that gets assigned a value appears to be AWS here https://github.com/kubernetes/kops/blob/v1.25.2/upup/pkg/fi/cloudup/new_cluster.go#L286.
The reason this cloud = nil problem hurts is because you cannot assign a machine type from the kops create cluster commandline to bastion hosts, you can only assign machine types to masters and nodes. So it's trying to think of a machine type itself and calls a defaultMachineType which requires a cloud instance.
It appears this part of the code went through quite a large refactor between 1.24.x and 1.25.x and now it's broken at least on OpenStack. I've tried two different OpenStack cloud providers.