Closed irishgordo closed 8 months ago
from the debug log:
the non-rc3 hits #3616 before upgrade, the upgrade validator denis the upgrade.
# Run kubectl commands inside here
# e.g. kubectl get all
> kubectl get bundle -A
NAMESPACE NAME BUNDLEDEPLOYMENTS-READY STATUS
fleet-local fleet-agent-local 0/1 ErrApplied(1) [Cluster fleet-local/local: another operation (install/upgrade/rollback) is in progress]
fleet-local local-managed-system-agent 0/1 ErrApplied(1) [Cluster fleet-local/local: another operation (install/upgrade/rollback) is in progress]
fleet-local mcc-harvester 1/1
fleet-local mcc-harvester-crd 1/1
fleet-local mcc-local-managed-system-upgrade-controller 0/1 ErrApplied(1) [Cluster fleet-local/local: another operation (install/upgrade/rollback) is in progress]
fleet-local mcc-rancher-logging 0/1 ErrApplied(1) [Cluster fleet-local/local: another operation (install/upgrade/rollback) is in progress]
fleet-local mcc-rancher-logging-crd 1/1
fleet-local mcc-rancher-monitoring 1/1
fleet-local mcc-rancher-monitoring-crd
> kubectl get managedchart -A
NAMESPACE NAME AGE
fleet-local harvester 21d
fleet-local harvester-crd 21d
fleet-local local-managed-system-upgrade-controller 21d
fleet-local rancher-logging 21d
fleet-local rancher-logging-crd 21d
fleet-local rancher-monitoring 21d
fleet-local rancher-monitoring-crd 21d
rc3
version: will check why harvester
bundle is in such a state.
# Run kubectl commands inside here
# e.g. kubectl get all
> kubectl get bundle -A
NAMESPACE NAME BUNDLEDEPLOYMENTS-READY STATUS
fleet-local fleet-agent-local 1/1
fleet-local local-managed-system-agent 1/1
fleet-local mcc-harvester 0/1 Modified(1) [Cluster fleet-local/local]; kubevirt.kubevirt.io harvester-system/kubevirt modified {"spec":{"customizeComponents":{"patches":[{"patch":"{\"webhooks\":[{\"name\":\"kubevirt-validator.kubevirt.io\",\"failurePolicy\":\"Ignore\"},{\"name\":\"kubevirt-update-validator.kubevirt.io\",\"failurePolicy\":\"Ignore\"}]}","resourceName":"virt-operator-validator","resourceType":"ValidatingWebhookConfiguration","type":"strategic"},{"patch":"{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"virt-api\", \"resources\":{\"limits\":{\"cpu\":\"400m\",\"memory\":\"1100Mi\"}}}]}}}}","resourceName":"virt-api","resourceType":"Deployment","type":"strategic"},{"patch":"{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"virt-controller\", \"resources\":{\"limits\":{\"cpu\":\"800m\",\"memory\":\"1300Mi\"}}}]}}}}","resourceName":"virt-controller","resourceType":"Deployment","type":"strategic"},{"patch":"{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"virt-handler\", \"resources\":{\"limits\":{\"cpu\":\"700m\",\"memory\":\"1600Mi\"}}}]}}}}","resourceName":"virt-handler","resourceType":"DaemonSet","type":"strategic"}]}}}
fleet-local mcc-harvester-crd 1/1
fleet-local mcc-local-managed-system-upgrade-controller 1/1
fleet-local mcc-rancher-logging 1/1
fleet-local mcc-rancher-logging-crd 1/1
fleet-local mcc-rancher-monitoring 1/1
fleet-local mcc-rancher-monitoring-crd 1/1
> kubectl get managedchart -A
NAMESPACE NAME AGE
fleet-local harvester 41h
fleet-local harvester-crd 41h
fleet-local local-managed-system-upgrade-controller 41h
fleet-local rancher-logging 41h
fleet-local rancher-logging-crd 41h
fleet-local rancher-monitoring 41h
fleet-local rancher-monitoring-crd 41h
fleet-local mcc-harvester 0/1 Modified(1) [Cluster fleet-local/local]; kubevirt.kubevirt.io harvester-system/kubevirt modified {"spec":{"customizeComponents":{"patches":[{"patch":"{\"webhooks\":[{\"name\":\"kubevirt-validator.kubevirt.io\",\"failurePolicy\":\"Ignore\"},{\"name\":\"kubevirt-update-validator.kubevirt.io\",\"failurePolicy\":\"Ignore\"}]}","resourceName":"virt-operator-validator","resourceType":"ValidatingWebhookConfiguration","type":"strategic"},{"patch":"{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"virt-api\", \"resources\":{\"limits\":{\"cpu\":\"400m\",\"memory\":\"1100Mi\"}}}]}}}}","resourceName":"virt-api","resourceType":"Deployment","type":"strategic"},{"patch":"{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"virt-controller\", \"resources\":{\"limits\":{\"cpu\":\"800m\",\"memory\":\"1300Mi\"}}}]}}}}","resourceName":"virt-controller","resourceType":"Deployment","type":"strategic"},{"patch":"{\"spec\":{\"template\":{\"spec\":{\"containers\":[{\"name\":\"virt-handler\", \"resources\":{\"limits\":{\"cpu\":\"700m\",\"memory\":\"1600Mi\"}}}]}}}}","resourceName":"virt-handler","resourceType":"DaemonSet","type":"strategic"}]}}}
be related to https://github.com/harvester/harvester/commit/8c620b7dbc2f218c3714aa185940929fc38e796f ?
The rc3
seems complaining something related to
https://github.com/harvester/harvester/commit/8c620b7dbc2f218c3714aa185940929fc38e796f
will check if it is in that state in a newly installed cluster.
@w13915984028 On a brand new single node v1.1.2-rc3 cluster without:
This is seen with:
# Run kubectl commands inside here
# e.g. kubectl get all
> kubectl get bundle -A
NAMESPACE NAME BUNDLEDEPLOYMENTS-READY STATUS
fleet-local fleet-agent-local 1/1
fleet-local local-managed-system-agent 1/1
fleet-local mcc-harvester 1/1
fleet-local mcc-harvester-crd 1/1
fleet-local mcc-local-managed-system-upgrade-controller 1/1
fleet-local mcc-rancher-logging 1/1
fleet-local mcc-rancher-logging-crd 1/1
fleet-local mcc-rancher-monitoring 1/1
fleet-local mcc-rancher-monitoring-crd 1/1
> kubectl get managedchart -A
NAMESPACE NAME AGE
fleet-local harvester 5m1s
fleet-local harvester-crd 5m1s
fleet-local local-managed-system-upgrade-controller 5m1s
fleet-local rancher-logging 5m1s
fleet-local rancher-logging-crd 5m1s
fleet-local rancher-monitoring 5m1s
fleet-local rancher-monitoring-crd 5m1s
after spinning up a vm, configuring cluster flow & cluster output it still yields:
# Run kubectl commands inside here
# e.g. kubectl get all
> kubectl get bundle -A
NAMESPACE NAME BUNDLEDEPLOYMENTS-READY STATUS
fleet-local fleet-agent-local 1/1
fleet-local local-managed-system-agent 1/1
fleet-local mcc-harvester 1/1
fleet-local mcc-harvester-crd 1/1
fleet-local mcc-local-managed-system-upgrade-controller 1/1
fleet-local mcc-rancher-logging 1/1
fleet-local mcc-rancher-logging-crd 1/1
fleet-local mcc-rancher-monitoring 1/1
fleet-local mcc-rancher-monitoring-crd 1/1
> kubectl get managedcharts -A
NAMESPACE NAME AGE
fleet-local harvester 14m
fleet-local harvester-crd 14m
fleet-local local-managed-system-upgrade-controller 14m
fleet-local rancher-logging 14m
fleet-local rancher-logging-crd 14m
fleet-local rancher-monitoring 14m
fleet-local rancher-monitoring-crd 14m
>
for v1.1.2-rc3 single node
Then after changing the systemUpgradeJobActiveDeadlineSeconds and such it still yields:
# Run kubectl commands inside here
# e.g. kubectl get all
> kubectl get bundle -A
NAMESPACE NAME BUNDLEDEPLOYMENTS-READY STATUS
fleet-local fleet-agent-local 1/1
fleet-local local-managed-system-agent 1/1
fleet-local mcc-harvester 1/1
fleet-local mcc-harvester-crd 1/1
fleet-local mcc-local-managed-system-upgrade-controller 1/1
fleet-local mcc-rancher-logging 1/1
fleet-local mcc-rancher-logging-crd 1/1
fleet-local mcc-rancher-monitoring 1/1
fleet-local mcc-rancher-monitoring-crd 1/1
> kubectl get managedcharts -A
NAMESPACE NAME AGE
fleet-local harvester 17m
fleet-local harvester-crd 17m
fleet-local local-managed-system-upgrade-controller 17m
fleet-local rancher-logging 17m
fleet-local rancher-logging-crd 17m
fleet-local rancher-monitoring 17m
fleet-local rancher-monitoring-crd 17m
Then trying to create the upgrade, it was able to be created.
Changing this to Reproduce Rare since it seems to be able to not be as easily reproduced...
from the kubevirts.yaml
, a special value is in the KubeVirt
object, tha's different with the default value, it may cause the complains from fleet
.
image: "image":"registry.suse.com/harvester-beta/virt-controller:0.54.0-1"
- apiVersion: kubevirt.io/v1
kind: KubeVirt
- patch: '{"spec":{"template":{"spec":{"containers":[{"name":"virt-controller",
"image":"registry.suse.com/harvester-beta/virt-controller:0.54.0-1","imagePullPolicy":"Always"}]}}}}'
resourceName: virt-controller
resourceType: Deployment
type: strategic
my local master-head release shows:
harv2:~ # kubectl get pods -n harvester-system virt-controller-5d54b8b9bf-rnw5g -oyaml | grep image
- --launcher-image
image: registry.suse.com/suse/sles/15.4/virt-controller:0.54.0-150400.3.7.1
imagePullPolicy: IfNotPresent
image: registry.suse.com/suse/sles/15.4/virt-controller:0.54.0-150400.3.7.1
imageID: sha256:30c23294b1b9fad7e729d52b3f0a296d16bc6d735c785f2e5e88fb4e7c7cf668
harv2:~ #
the image of registry.suse.com/harvester-beta/virt-controller:0.54.0-1
is patched to virt-controller
, but virt-operator
is normal
...
I believe that patch was due to testing: https://github.com/harvester/harvester/wiki/Replace-KubeVirt-virt-controller-and-other-KubeVirt-component-images
@irishgordo @bk201 @guangbochen
There are 2 screnarios here:
(1) In the v1.1.2-rc3 release upgrade test, the upgrade check blocks upgrade due to the temp patch to virt-controller
, the fleet-agent
complains that the harvester
bundle is modified. The output meets the design.
(2) In v1.1-head release upgrade test, the same issue with #3616 was encountered.
At the moment, we have no more planned fix for this issue.
How to continue this issue? thanks.
Will monitoring this issue when dealing with the v1.2.0-rc upgrade, thanks.
This happens when we manually patch kube-virt image https://github.com/harvester/harvester/issues/3715#issuecomment-1481965453 @Vicente-Cheng, we need a way to deal with this case.
seems we could add following to harvester/harvester-installer/pkg/config/templates/rancherd-10-harvester.yaml
- apiVersion: kubevirt.io/v1
jsonPointers:
- /spec/customizeComponents
kind: KubeVirt
name: kubevirt
let fleet-agent skip checking the changes in spec.customizeComponents, then we can patch kube-virt
needs to be in both v1.1.3 and v1.2.0
please @Vicente-Cheng help verify, thanks.
Move this issue to v1.2.1 for a note; modifying the default kubevirt config is not supported in the current stage, and since the kubevirt patch is already included in Harvester v1.2.0, so users will need to revert it before the upgrade.
Update the https://github.com/harvester/harvester/wiki/Replace-KubeVirt-virt-controller-and-other-KubeVirt-component-images for the above upgrade solution.
Thanks, @w13915984028, for giving the patch. I tested it, and it looks well.
And I will add a document for this, too. (Not only on the wiki, but also document should mention this)
After discussion, we thought the wiki information should be enough. Most people would not patch kubevirt manually.
Let's move forward.
~* [ ] If labeled: require/HEP Has the Harvester Enhancement Proposal PR submitted? The HEP PR is at:~
Test steps as below:
~* [ ] Is there a workaround for the issue? If so, where is it documented? The workaround is at:~
~ [ ] Have the backend code been merged (harvester, harvester-installer, etc) (including `backport-needed/`)? The PR is at:~
~* [ ] If labeled: area/ui Has the UI issue filed or ready to be merged? The UI issue/PR is at:~
~* [ ] If labeled: require/doc, require/knowledge-base Has the necessary document PR submitted or merged? The documentation/KB PR is at:~
~* [ ] If NOT labeled: not-require/test-plan Has the e2e test plan been merged? Have QAs agreed on the automation test case? If only test case skeleton w/o implementation, have you created an implementation issue?
~* [ ] If the fix introduces the code for backward compatibility Has a separate issue been filed with the label release/obsolete-compatibility
?
The compatibility issue is filed at:~
Automation e2e test issue: harvester/tests#1020
@Vicente-Cheng thanks for the mention :+1: :smile:
following the test-plan things look good :smile:
Waiting for ManagedChart fleet-local/harvester-crd from generation 2
Target version: 1.1.2, Target state: ready
Current version: 1.1.2, Current state: null, Current generation: 4
Waiting for KubeVirt to upgraded to 0.54.0-150400.3.10.4...
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
KubeVirt current version: 0.54.0-150400.3.7.1, target version: 0.54.0-150400.3.10.4
Waiting for LH settling down...
Waiting for longhorn-manager to be upgraded...
Checking instance-manager-r pod on node harvester-node-0...
Additionally, as a smoke-test:
Validated that:
Configured with:
sudo sysctl -w vm.max_map_count=262144
docker run -d --name elasticsearch -p 9200:9200 -p 9300:9300 -e xpack.security.enabled=false -e node.name=es01 -it docker.elastic.co/elasticsearch/elasticsearch:6.8.23
docker run -d --name kibana --link elasticsearch:es_alias --env "ELASTICSEARCH_URL=http://es_alias:9200" -p 5601:5601 -it docker.elastic.co/kibana/kibana:6.8.23
ElasticSearch: 6.8.23 Kibana: 6.8.23
With ElasticSearch Index And User built like attached postman loadout: Sample ElasticSearch Setup.postman_collection.json
Do not cause any issues. v1.2.1 -> v1.2-head -> v1.3-head hvst-upgrade-n9nhz-upgradelog-archive-2024-03-04T22-55-56Z.zip hvst-upgrade-q8dwv-upgradelog-archive-2024-03-04T20-44-41Z.zip
I'll go ahead and close this out :smile:
Describe the bug Running into an issue where either on:
(file-server accessible by separate clusters) Upgrade will not process with Error of:
To Reproduce
Prerequisites:
sudo sysctl -w vm.max_map_count=262144
sudo docker run --name elasticsearch -p 9200:9200 -p 9300:9300 -e xpack.security.enabled=false -e node.name=es01 -it docker.elastic.co/elasticsearch/elasticsearch:6.8.23
sudo docker run --name kibana --link elasticsearch:es_alias --env "ELASTICSEARCH_URL=http://es_alias:9200" -p 5601:5601 -it docker.elastic.co/kibana/kibana:6.8.23
In ElasticSearch PreReqs: 1.have built out in ElasticSearch: an elastic search user, via, replace localhost as needed:
Steps to reproduce the behavior:
$ kubectl patch managedcharts.management.cattle.io local-managed-system-upgrade-controller --namespace fleet-local --patch-file=/tmp/fix.yaml --type merge $ kubectl -n cattle-system rollout restart deploy/system-upgrade-controller