Open jrha opened 4 days ago
@jrha a very minor issue, it seems template-library-examples
has been tagged before merging the last PR fixing examples. Do you confirm? The consequence is that testing explicitely 24.10.0-rc4
with create-vanilla-scdb.sh
fails but it is not very important as HEAD
is working.
I gave a try to 24.10.0-rc4 on our production instance. It compiles fine but it seems something doesn't work as expected with AII configuration (at least on an EL9 system). I'll try to look at the reason, probably some modified templates for AII configuration that we forgot to upstream... For information, here is the relevant section of the diff between the template produced by our modified 23.6.0 (to support EL9) and 24.10.0-rc4:
@@ -2119,7 +2118,7 @@
"initrd": "al9_x86_64/initrd.img",
"kernel": "al9_x86_64/vmlinuz",
"ksdevice": "em1",
- "kslocation": "https://quattorweb.ijclab.in2p3.fr/ks/os-77012.lal.in2p3.fr.ks",
+ "kslocation": "nfs:quattorweb.ijclab.in2p3.fr:/ks/os-77012.lal.in2p3.fr.ks",
"label": "Linux al9-x86_64",
"setifnames": true
}
@@ -2127,7 +2126,7 @@
"osinstall": {
"ks": {
"acklist": [
- "https://quattorweb.ijclab.in2p3.fr/cgi-bin/aii-installack.cgi"
+ "http://quattorweb.ijclab.in2p3.fr/cgi-bin/aii-installack.cgi"
],
"auth": [
"enableshadow",
@@ -2150,7 +2149,7 @@
],
"bootdisk_order": [],
"bootloader_location": "mbr",
- "bootproto": "static",
+ "bootproto": "dhcp",
"clearmbr": true,
"clearpart": [
"sda"
@@ -2163,7 +2162,7 @@
"eula": true,
"ignored_repos": [],
"ignoredisk": [],
- "installtype": "url --url https://quattorweb.ijclab.in2p3.fr/yum/snapshots/20240527/al9-baseos-x86_64",
+ "installtype": "url --url https://quattorweb.ijclab.in2p3.fr/yum/snapshots/20240527/al9-x86_64",
"kernelinpost": true,
"keyboard": "us",
"lang": "en_US",
@@ -2182,27 +2181,6 @@
"osinstall_protocol": "https",
"packages": [
"NetworkManager-config-server",
- "curl",
- "lsof",
- "openssh",
- "openssh-server",
- "perl-AppConfig",
- "perl-CDB_File",
- "perl-Crypt-SSLeay",
- "perl-DBI",
- "perl-GSSAPI",
- "perl-IO-String",
- "perl-libwww-perl",
- "perl-Pod-POM",
- "perl-Template-Toolkit",
- "perl-URI",
- "perl-XML-Parser",
- "yum-plugin-priorities",
- "python3-dnf-plugin-versionlock",
- "wget",
- "perl-English",
- "chkconfig",
- "initscripts",
"NetworkManager"
],
"packages_args": [
@@ -2361,7 +2339,7 @@
"set_hwaddr": true
},
"em2": {
- "bootproto": "none",
+ "bootproto": "dhcp",
"onboot": false,
"set_hwaddr": true
},
@@ -2386,8 +2364,7 @@
"nameserver": [
"134.158.88.149",
"134.158.88.78"
- ],
- "nm_manage_dns": false
+ ]
I am almost done fixing everything and will open a PR soon on AII
(most fixes) and template-library-core
.
Brilliant, thanks!
The last issue is the most difficult: the definition of ncm-network
backend based on QUATTOR_TYPES_NETWORK_BACKEND
(which trigger the definition of manage_dns
for ncm-nmstate
). The difficulty is that everything related to the schema is included very early in the configuration, either when quattor/profile_base
in included (in the core OS configuration) or when any component schema is included (as they all start by including quattor/schema
). That means there is no chance to define QUATTOR_TYPES_NETWORK_BACKEND
in the OS configuration (for exemple it is used for el9 but not for el7 or el8 by default). I tried to postpone including profile_base
and the first component in machine-types/core
but I didn't succeed so far.
I have tried several things but they all fail. As a type cannot be redefined in pan, there is no way to start with a partially defined schema and refined it later. Any suggestion is welcome. Without a fix for this, it means that the variable QUATTOR_TYPES_NETWORK_BACKEND
must be defined in the profile or site templates which IMO is not desirable... Not sure if this is issue affects only SCDB as probably machine-types/core
from template-library-standard
is not used in Aquilon...
After more troubleshooting of this issue, I have the feeling that with a bit of refactoring of machine-types/core
, it should be able to define QUATTOR_TYPES_NETWORK_BACKEND
in OS templates and that the remaining issue is GRIF specific. To be confirmed...
FWIW the include order in aquilon always puts the OS before the personality, which makes this straightforward for us.
This is normally what we do in SCDB too. But as there is no a central control of the object template like in Aquilon, nothing prevents a site from loading a component before configuring the OS... Still trying to figure out what the problem is in our SCDB but I think we can consider it is not a blocking issue...
I managed to fix the problem. For SCDB users, it requires a refactoring of machine-types/core-init
/machine-types/core
. I will open a PR soon against template-library-standard
...
24.10.0-rc4 with the pending PRs for the template library has been successfully deployed on a test cluster at IJCLab. I plan to do a reinstallation in the coming hours to check the possibility to remove initscripts and chkconfig from the default EL9 configuration.
I'll do an RC5 build later due to the amount of changes that have been merged since RC4.
I plan to open a PR very soon to remove chkconfig
and initscripts
from the default OS configuration...
Removing chkconfig
, I realized that ncm-named
and ncm-systemd
still relies on it. ncm-named
is probably easy to fix but I'll let @stdweird or another ncm-systemd
expert look why ncm-systemd
requires chkconfig
binary when components/systemd/legacy/chkconfig
is not included.
As usual I was too quick about ncm-named
. The reason is still relies on chkconfig
is documented in comments: as it is currently it wants to check a few things about the service and requires actions that are not provided by ncm-ncd
, as status
which is not (yet?) provided if I am right. A more serious reason is maybe listed in https://github.com/quattor/configuration-modules-core/issues/434: it is considered as an obsolete component replaced by ncm-metaconfig
. I need to check if everything we need can be done by ncm-metaconfig
and we could then switch to it in template-library-os
. Probably for next release...
Top level issue tracker for any blocking issues discovered with 24.10.0-rc4.