Closed kianjones4 closed 5 years ago
Note that the bug itself does not have anything to do with Kafka. If a cluster has multiple instance-groups (ASGs) where name of one is a substring of other/s, upgrade-manager can get confused and cause rolling-upgrade of instances in different IGs.
Is this a BUG REPORT or FEATURE REQUEST?: BUG REPORT
What happened: upgrade failed for kafka cluster with zookeeper saying
Instances not available
, and rollup objects stuck inerror
stateWhat you expected to happen: Instances in the zk-nodes and kafka-nodes asgs to be upgraded appropriately.
How to reproduce it (as minimally and precisely as possible): Create asg with name zk-nodes.kafka-test.cluster.k8s.local or kafka-nodes.kafka-test.cluster.k8s.local in aws, then try to submit a rollup object with spec.AsgName: zk-nodes.kafka-test.cluster.k8s.local or kafka-nodes.kafka-test.cluster.k8s.local and check the logs of the rollup controller pod
Anything else we need to know?: This doesn't seem to be a problem with all asg names. My other rollups
foo-bar1
andiks-system
upgraded successfullyEnvironment: AWS
Other debugging information (if applicable):
RollingUpgrade status: error
$ kubectl logs
2019/09/26 13:20:38 error: Instances are not available for update occurred for rollup-kafka-nodes-1.21.0-snapshot-kafka-nodes.kafka-test.cluster.k8s.local-20190926194335
2019/09/26 13:20:38 error: Instances are not available for update occurred for rollup-kafka-nodes-1.21.0-snapshot-kafka-nodes.kafka-test.cluster.k8s.local-20190926194335
2019/09/26 13:20:38 Deleted the entries of ASG kafka-nodes.kafka-test.cluster.k8s.local in the cluster store for rollup-kafka-nodes-1.21.0-snapshot-kafka-nodes.kafka-test.cluster.k8s.local-20190926194335
2019/09/26 13:20:38 Marked object rollup-kafka-nodes-1.21.0-snapshot-kafka-nodes.kafka-test.cluster.k8s.local-20190926194335 as error
2019/09/26 13:20:38 Deleted rollup-kafka-nodes-1.21.0-snapshot-kafka-nodes.kafka-test.cluster.k8s.local-20190926194335 from admission map 0xc0001fd740