quattor / release

Release Management Tools
https://www.quattor.org/release
Apache License 2.0
4 stars 13 forks source link

24.10.0-rc4 #365

Open jrha opened 4 days ago

jrha commented 4 days ago

Top level issue tracker for any blocking issues discovered with 24.10.0-rc4.

jouvin commented 4 days ago

@jrha a very minor issue, it seems template-library-examples has been tagged before merging the last PR fixing examples. Do you confirm? The consequence is that testing explicitely 24.10.0-rc4 with create-vanilla-scdb.sh fails but it is not very important as HEAD is working.

jouvin commented 2 days ago

I gave a try to 24.10.0-rc4 on our production instance. It compiles fine but it seems something doesn't work as expected with AII configuration (at least on an EL9 system). I'll try to look at the reason, probably some modified templates for AII configuration that we forgot to upstream... For information, here is the relevant section of the diff between the template produced by our modified 23.6.0 (to support EL9) and 24.10.0-rc4:

@@ -2119,7 +2118,7 @@
           "initrd": "al9_x86_64/initrd.img",
           "kernel": "al9_x86_64/vmlinuz",
           "ksdevice": "em1",
-          "kslocation": "https://quattorweb.ijclab.in2p3.fr/ks/os-77012.lal.in2p3.fr.ks",
+          "kslocation": "nfs:quattorweb.ijclab.in2p3.fr:/ks/os-77012.lal.in2p3.fr.ks",
           "label": "Linux al9-x86_64",
           "setifnames": true
         }
@@ -2127,7 +2126,7 @@
       "osinstall": {
         "ks": {
           "acklist": [
-            "https://quattorweb.ijclab.in2p3.fr/cgi-bin/aii-installack.cgi"
+            "http://quattorweb.ijclab.in2p3.fr/cgi-bin/aii-installack.cgi"
           ],
           "auth": [
             "enableshadow",
@@ -2150,7 +2149,7 @@
           ],
           "bootdisk_order": [],
           "bootloader_location": "mbr",
-          "bootproto": "static",
+          "bootproto": "dhcp",
           "clearmbr": true,
           "clearpart": [
             "sda"
@@ -2163,7 +2162,7 @@
           "eula": true,
           "ignored_repos": [],
           "ignoredisk": [],
-          "installtype": "url --url https://quattorweb.ijclab.in2p3.fr/yum/snapshots/20240527/al9-baseos-x86_64",
+          "installtype": "url --url https://quattorweb.ijclab.in2p3.fr/yum/snapshots/20240527/al9-x86_64",
           "kernelinpost": true,
           "keyboard": "us",
           "lang": "en_US",
@@ -2182,27 +2181,6 @@
           "osinstall_protocol": "https",
           "packages": [
             "NetworkManager-config-server",
-            "curl",
-            "lsof",
-            "openssh",
-            "openssh-server",
-            "perl-AppConfig",
-            "perl-CDB_File",
-            "perl-Crypt-SSLeay",
-            "perl-DBI",
-            "perl-GSSAPI",
-            "perl-IO-String",
-            "perl-libwww-perl",
-            "perl-Pod-POM",
-            "perl-Template-Toolkit",
-            "perl-URI",
-            "perl-XML-Parser",
-            "yum-plugin-priorities",
-            "python3-dnf-plugin-versionlock",
-            "wget",
-            "perl-English",
-            "chkconfig",
-            "initscripts",
             "NetworkManager"
           ],
           "packages_args": [
@@ -2361,7 +2339,7 @@
           "set_hwaddr": true
         },
         "em2": {
-          "bootproto": "none",
+          "bootproto": "dhcp",
           "onboot": false,
           "set_hwaddr": true
         },
@@ -2386,8 +2364,7 @@
       "nameserver": [
         "134.158.88.149",
         "134.158.88.78"
-      ],
-      "nm_manage_dns": false
+      ]
jouvin commented 2 days ago

I am almost done fixing everything and will open a PR soon on AII (most fixes) and template-library-core.

jrha commented 2 days ago

Brilliant, thanks!

jouvin commented 2 days ago

The last issue is the most difficult: the definition of ncm-network backend based on QUATTOR_TYPES_NETWORK_BACKEND (which trigger the definition of manage_dns for ncm-nmstate). The difficulty is that everything related to the schema is included very early in the configuration, either when quattor/profile_base in included (in the core OS configuration) or when any component schema is included (as they all start by including quattor/schema). That means there is no chance to define QUATTOR_TYPES_NETWORK_BACKEND in the OS configuration (for exemple it is used for el9 but not for el7 or el8 by default). I tried to postpone including profile_base and the first component in machine-types/core but I didn't succeed so far.

I have tried several things but they all fail. As a type cannot be redefined in pan, there is no way to start with a partially defined schema and refined it later. Any suggestion is welcome. Without a fix for this, it means that the variable QUATTOR_TYPES_NETWORK_BACKEND must be defined in the profile or site templates which IMO is not desirable... Not sure if this is issue affects only SCDB as probably machine-types/core from template-library-standard is not used in Aquilon...

jouvin commented 1 day ago

After more troubleshooting of this issue, I have the feeling that with a bit of refactoring of machine-types/core, it should be able to define QUATTOR_TYPES_NETWORK_BACKEND in OS templates and that the remaining issue is GRIF specific. To be confirmed...

jrha commented 1 day ago

FWIW the include order in aquilon always puts the OS before the personality, which makes this straightforward for us.

jouvin commented 1 day ago

This is normally what we do in SCDB too. But as there is no a central control of the object template like in Aquilon, nothing prevents a site from loading a component before configuring the OS... Still trying to figure out what the problem is in our SCDB but I think we can consider it is not a blocking issue...

jouvin commented 1 day ago

I managed to fix the problem. For SCDB users, it requires a refactoring of machine-types/core-init/machine-types/core. I will open a PR soon against template-library-standard...

jouvin commented 5 hours ago

24.10.0-rc4 with the pending PRs for the template library has been successfully deployed on a test cluster at IJCLab. I plan to do a reinstallation in the coming hours to check the possibility to remove initscripts and chkconfig from the default EL9 configuration.

jrha commented 3 hours ago

I'll do an RC5 build later due to the amount of changes that have been merged since RC4.

jouvin commented 2 hours ago

I plan to open a PR very soon to remove chkconfig and initscripts from the default OS configuration...

jouvin commented 1 hour ago

Removing chkconfig, I realized that ncm-named and ncm-systemd still relies on it. ncm-named is probably easy to fix but I'll let @stdweird or another ncm-systemd expert look why ncm-systemd requires chkconfig binary when components/systemd/legacy/chkconfig is not included.

jouvin commented 16 minutes ago

As usual I was too quick about ncm-named. The reason is still relies on chkconfig is documented in comments: as it is currently it wants to check a few things about the service and requires actions that are not provided by ncm-ncd, as status which is not (yet?) provided if I am right. A more serious reason is maybe listed in https://github.com/quattor/configuration-modules-core/issues/434: it is considered as an obsolete component replaced by ncm-metaconfig. I need to check if everything we need can be done by ncm-metaconfig and we could then switch to it in template-library-os. Probably for next release...