suse-edge / charts

SUSE Edge engineering Helm charts
https://suse-edge.github.io/charts/
Apache License 2.0
7 stars 20 forks source link

Release metal3 0.6.0 #86

Closed Kristian-ZH closed 9 months ago

Kristian-ZH commented 9 months ago

This release includes the following changes:

ipetrov117 commented 9 months ago

I will do a deployment on Sylva later today and update on whether it works there as well.

hardys commented 9 months ago

@Kristian-ZH since this release changes the CRDs we need to decide how to handle that, these changes should be backwards compatible so I think our options are either document manually applying the crds subdir prior to helm upgrade, or consider creating a new -crds chart.

@ipetrov117 are you aware of any specific handling of this scenario in Sylva for other charts?

Kristian-ZH commented 9 months ago

I think our options are either document manually applying the crds subdir prior to helm upgrade, or consider creating a new -crds chart.

I was thinking to document that and to leave the decision for extracting crds to a dedicated dir for later because this should be aligned with all the charts, not only Metal3

ipetrov117 commented 9 months ago

@ipetrov117 are you aware of any specific handling of this scenario in Sylva for other charts?

@hardys, I am not aware of any specific convention that is being followed. AFAIK Sylva use whatever the main chart provides in terms of deployment logic. For instance, if the chart foo uses 2 charts for its deployment logic (foo-crds and foo), then 2 units are created in Sylva - foo-crds and foo, where foo depends on foo-crds. Example can be found here.

My 2 cents on the metal3 chart upgrade discussion.. There doesn't seem to be a concrete approach when it comes to upgrading CRDs. Looking at the helm doc regarding this, we have 2 possible approaches:

  1. Have a crds directory in the chart and leave helm to deploy the crds for us. This is the current approach that we and Sylva seem to be taking. While easier, this approach does not handle CRD upgrades well (at least from what I understand).
  2. Have a foo-crd and foo charts, where foo-crd is always deployed before foo.

IMO documenting the manual CRD sync steps is an okay workaround as of this moment, but for the long term IMO we should think about option (2) mainly for 2 reasons:

  1. Users would not need to sync their management clusters CRDs by hand, which is always good
  2. By having the CRDs in a chart outside of our main chart we can always ensure that they have been upgraded to the desired state before running the main chart upgrade.
hardys commented 9 months ago

@Kristian-ZH I tested in the metal3-demo environment and I think we have an issue with UEFI, we see the BMH fail to start inspection like:

  Normal  InspectionError    54m   metal3-baremetal-controller  Failed to inspect hardware. Reason: unable to start inspection: Failed to download image http://192.168.125.10:6180/uefi_esp.img, reason: HTTPConnectionPool(host='192.168.125.10', port=6180): Max retries exceeded with url: /uefi_esp.img (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f5602727c50>: Failed to establish a new connection: [Errno 113] EHOSTUNREACH'))

Looking at the ironic config I think we can see the problem:

shardy@localhost:~> kubectl exec -n metal3-system metal3-metal3-ironic-677bc5c8cc-8cqdg -c ironic -- cat /etc/ironic/ironic.conf | grep uefi_esp
bootloader = http://192.168.125.10:6180/uefi_esp.img
shardy@localhost:~> kubectl get service -n metal3-system
NAME                                                    TYPE           CLUSTER-IP     EXTERNAL-IP      PORT(S)                                        AGE
baremetal-operator-controller-manager-metrics-service   ClusterIP      10.43.95.163   <none>           8443/TCP                                       86m
baremetal-operator-webhook-service                      ClusterIP      10.43.210.62   <none>           443/TCP                                        86m
metal3-mariadb                                          ClusterIP      10.43.136.41   <none>           3306/TCP                                       86m
metal3-metal3-ironic                                    LoadBalancer   10.43.57.230   192.168.125.10   6185:31785/TCP,5050:30890/TCP,6385:30479/TCP   86m
hardys commented 9 months ago

Additional note, in the previous ironic image we had

bootloader = {{ env.IRONIC_BOOT_BASE_URL }}/uefi_esp.img

But I think due to the upstream rebase that got lost and is now e.g

bootloader = http://{{ env.IRONIC_URL_HOST }}:{{ env.HTTP_PORT }}/uefi_esp.img
hardys commented 9 months ago

I re-tested after https://build.opensuse.org/request/show/1144913 merged and all works OK now in the metal3-demo environment