quattor / configuration-modules-core

Node Configuration Manager Components for Everyone
www.quattor.org
Other
6 stars 54 forks source link

ncm-systemd: alias sys-device-pci doesn't match expected pattern (EL9) #1677

Closed jouvin closed 1 month ago

jouvin commented 5 months ago

I'm trying to install an EL9 server (Alma 9.3) and when running ncm-systemd, I get the following errors:

2024/04/02-11:37:57 [VERB] [INFO] running component: systemd
2024/04/02-11:37:57 [VERB] ---------------------------------------------------------
2024/04/02-11:37:59 [VERB] [ERROR] Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/02-11:37:59 [VERB] [ERROR] get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
2024/04/02-11:38:00 [VERB] [ERROR] Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/02-11:38:00 [VERB] [ERROR] get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
2024/04/02-11:38:01 [VERB] [INFO] Configure on component systemd executed, 4 errors, 0 warnings

When looking at the details in component/systemd.log, I find:

2024/04/02-11:15:20 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-units
2024/04/02-11:15:20 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-unit-files
2024/04/02-11:15:20 [VERB] Getting output of command: /usr/bin/systemctl --no-pager --all show -- sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device
2024/04/02-11:15:20 [ERROR] Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/02-11:15:20 [VERB] make_cache_alias completed with 328 cached units 37 alias units
2024/04/02-11:15:20 [ERROR] get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
2024/04/02-11:15:20 [VERB] Undefined UnitFileState for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device

Environement used:

jouvin commented 5 months ago

I just installed another server with a different hardware model and I didn't see the problem initially but it appeared after adding a couple of RPMs related to NFS client. Not sure whether it is related or just that it appeared after the initial configuration/ncm-systemd run...

The HW is also a server from Dell, not the same model but the same generation, so an HW-related issue cannot be excluded...

stdweird commented 5 months ago

@jouvin is there a way for you to find out when this unit was discovered/added? maybe journalctl will tell. i suspect that it pops up during the ncm-systemd run.

if you run it a second time, is the error gone?

jouvin commented 5 months ago

@stdweird no, it's the opposite. During first run the problem is not there but after that it appears and never disappears. I reinstalled my test box to better assess when it happens and this time it happened at the very first run of ncm-systemd. I may have missed it during my previous checks... From journalctl, I get this with journalctl|grep pci|grep usb :

Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 1-1: new high-speed USB device number 2 using ehci-pci
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 2-1: new high-speed USB device number 2 using ehci-pci
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 1-1.6: new high-speed USB device number 3 using ehci-pci
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 1-1.6.3: new high-speed USB device number 4 using ehci-pci
Apr 04 10:17:19 psonar1.ijclab.in2p3.fr component-systemd[1266]: Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
Apr 04 10:17:19 psonar1.ijclab.in2p3.fr component-systemd[1266]: get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
Apr 04 10:17:20 psonar1.ijclab.in2p3.fr component-systemd[1266]: Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
Apr 04 10:17:20 psonar1.ijclab.in2p3.fr component-systemd[1266]: get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
stdweird commented 5 months ago

ok, my next guess is that the device uses some utf8 chars in the device name, and the regex doesn't match it because it's not properly dealing with utf8. can you locate the file Systemd/Service/Unit.pm and add use utf8; after the use 5.10.1; and see if this works?

jouvin commented 5 months ago

@stdweird unfortunately it doesn't help. But I think you are right: the name contains some hexadecimal characters that may be a unicode one. Doing systemctl|grep usb, it seems to be the Virtual NIC created for the management port/card of the server:

  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3-1\x2d1.6.3:1.0-net-idrac.device loaded active     plugged   iDRAC Virtual NIC
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device                          loaded active     plugged   iDRAC Virtual NIC

According to https://www.compart.com/en/unicode/U+02D1, it may be a "half triangular colon"...

stdweird commented 5 months ago

@jouvin hmm, next try: replace the use 5.10.1 with use 5.12 (you can try with or without use utf8, but that should not matter)

stdweird commented 5 months ago

@jouvin or do a systemctl show strangeunit.device > output and mail me that. i'll have a look what i can do to make it work

stdweird commented 5 months ago

ok, next guess: ithas nothing to do with utf8

there is a method in Unit.pm called _handle_bug_wrong_escaped_unit. it does somethign similar and i think it needs to be extended with support for \x2d :

[root@test2819 ~]# systemd-escape '-'
\x2d
[root@test2819 ~]# systemd-escape -u 'sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device'
sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.3.device
[root@test2819 ~]# systemd-escape 'sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.3.device'
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device

so in that method add last line

...
my $newid = join("\\", split(/\\x5c/, $id));
$newid = join("-", split(/\\x2d1/, $newid)); # add this line
jouvin commented 5 months ago

@stdweird still not working unfortunately: no message about the wrong escaping found in component-systemd.log. My guess is that the unit name has the \x2d1 characters and thus the test in the test in the method _handle_bug_wrong_escaped_unit before doing the escaping doesn't match (unit name different from the id). What about doing the systemd-escape -u for each unit? It is harmless if there is not escaped characters...

jouvin commented 4 months ago

@stdweird I have been busy deploying our first EL9 systems and had no time to troubleshhot more this problem and come with a fix... I can only say that I started to deploy servers from a different vendor (HP) where the problem doesn't appear... Seems somewhat HW-related...

stdweird commented 4 months ago

@jouvin i just tried to setup idrac with virtual media attached. i see bunch of devices pop up in dmesg, but nothing going wrong in systemd units. if you can mail me the ouptut of systemctl show strangeunit.device > output, i'll be able to investiagte further

jouvin commented 4 months ago

@stdweird here it is: idrac_unit.out.txt

stdweird commented 4 months ago

can you also do

systemctl list-units | grep pci-device
systemctl list-units | grep pci-device | cat -v

and paste that here. in the Id in the output, there is no escaping; i guess that is the issue.

jouvin commented 4 months ago

@stdweird here it is:

[root@quattorsrv ~]# systemctl list-units | grep sys-devices-pci0000:00-0000:00:1a.0-usb1-1
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3-1\x2d1.6.3:1.0-net-idrac.device loaded    active     plugged   iDRAC Virtual NIC
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device                          loaded    active     plugged   iDRAC Virtual NIC
[root@quattorsrv ~]# systemctl list-units | grep sys-devices-pci0000:00-0000:00:1a.0-usb1-1|cat -v
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3-1\x2d1.6.3:1.0-net-idrac.device loaded    active     plugged   iDRAC Virtual NIC
  sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device                          loaded    active     plugged   iDRAC Virtual NIC
jrha commented 4 months ago

https://www.freedesktop.org/software/systemd/man/latest/systemd.unit.html#String%20Escaping%20for%20Inclusion%20in%20Unit%20Names

all other characters which are not ASCII alphanumerics, ":", "_" or "." are replaced by C-style "\x2d" escapes.

jrha commented 4 months ago

e.g.

> systemd-escape 'abc/123'
abc-123
> systemd-escape 'abc:123'
abc:123
> systemd-escape 'abc-123'
abc\x2d123
> systemd-escape 'abc#123'
abc\x23123
> systemd-escape 'abc?123'
abc\x3f123
> systemd-escape 'abc^123'
abc\x5e123
jouvin commented 4 months ago

Why we don't use systemd-escape to process the names we receive from systemd. I gave it a try in my original tests but failed to complete the change... The unescape function in the component mentions in the comments that it could be an approach...

wdpypere commented 4 months ago

for reference, we also get a similar error while configuring qemu-guest-agent in systemd:

2024/04/25-13:12:59 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-units
2024/04/25-13:12:59 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-unit-files
2024/04/25-13:13:00 [VERB] Getting output of command: /usr/bin/systemctl --no-pager --all show -- sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device
2024/04/25-13:13:00 [ERROR] Found alias "sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\\x2dports-vport3p1.device" for unit sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/25-13:13:00 [VERB] Getting output of command: /usr/bin/systemctl --no-pager --all show -- dev-virtio\x2dports-org.qemu.guest_agent.0.device
2024/04/25-13:13:00 [ERROR] Found alias "dev-virtio\\x2dports-org.qemu.guest_agent.0.device" for unit dev-virtio\x2dports-org.qemu.guest_agent.0.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/25-13:13:00 [VERB] make_cache_alias completed with 358 cached units 46 alias units
2024/04/25-13:13:00 [ERROR] get_unit_show: no alias for unit dev-virtio\x2dports-org.qemu.guest_agent.0.device defined. (Forgot to update cache?)
2024/04/25-13:13:00 [VERB] Undefined UnitFileState for unit dev-virtio\x2dports-org.qemu.guest_agent.0.device
2024/04/25-13:13:00 [ERROR] get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device defined. (Forgot to update cache?)
2024/04/25-13:13:00 [VERB] Undefined UnitFileState for unit sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device
stdweird commented 4 months ago

not sure this is the same. output of command and is same one el8 and el9

# /usr/bin/systemctl --no-pager --all show -- dev-virtio\x2dports-org.qemu.guest_agent.0.device|grep -e 'Names=\|Id='
Id=dev-virtiox2dports-org.qemu.guest_agent.0.device
Names=dev-virtiox2dports-org.qemu.guest_agent.0.device

this smells like another escaping bug, you clearly see here that the backslash from \x2d from the command line unitname is not in the output of systemctl show Id or Names. so the component is confused

i also spotted another bug: the regex of teh list-units parser needs the extra (?:(?:.|[?]{3})\s)?

Ouptut from /usr/bin/systemctl --all --no-pager --no-legend --full list-units does not match pattern (?^:^(?:.\s)?(?<name>(?<shortname>\S+)\.(?<type>\w+))\s+(?<
loaded>\S+)\s+(?<active>\S+)\s+(?<running>\S+)(?:\s+|$)): ??? syslog.target 

(that is with partial fix)

jouvin commented 4 months ago

A colleague at GRIF had the same error as @wdpypere ... I think it is the same escaping issue...

aka7 commented 3 months ago

we have also hit this issue now. seeing the following

2024/06/03-12:26:11 [ERROR] Found alias "dev-disk-by\\x2duuid-be3848db\\x2d217a\\x2d4574\\x2d80e0\\x2d67813f0d4407.device" for unit dev-disk-by\x2duuid-be3848db\x2d217a\x2d4574\x2d80e0\x2d67813f0d4407.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.

Are we close to any fix yet?

jouvin commented 3 months ago

@aka7 as for me, I had no time to troubleshoot more this issue, I'd to switch on more urgent issues... but I'm still interested by a fix!