Closed jouvin closed 1 month ago
I just installed another server with a different hardware model and I didn't see the problem initially but it appeared after adding a couple of RPMs related to NFS client. Not sure whether it is related or just that it appeared after the initial configuration/ncm-systemd run...
The HW is also a server from Dell, not the same model but the same generation, so an HW-related issue cannot be excluded...
@jouvin is there a way for you to find out when this unit was discovered/added? maybe journalctl will tell. i suspect that it pops up during the ncm-systemd run.
if you run it a second time, is the error gone?
@stdweird no, it's the opposite. During first run the problem is not there but after that it appears and never disappears. I reinstalled my test box to better assess when it happens and this time it happened at the very first run of ncm-systemd
. I may have missed it during my previous checks... From journalctl, I get this with journalctl|grep pci|grep usb
:
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 1-1: new high-speed USB device number 2 using ehci-pci
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 2-1: new high-speed USB device number 2 using ehci-pci
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 1-1.6: new high-speed USB device number 3 using ehci-pci
Apr 04 10:16:45 psonar1.ijclab.in2p3.fr kernel: usb 1-1.6.3: new high-speed USB device number 4 using ehci-pci
Apr 04 10:17:19 psonar1.ijclab.in2p3.fr component-systemd[1266]: Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
Apr 04 10:17:19 psonar1.ijclab.in2p3.fr component-systemd[1266]: get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
Apr 04 10:17:20 psonar1.ijclab.in2p3.fr component-systemd[1266]: Found alias "sys-devices-pci0000:00-0000:00:1a.0-usb1-1\\x2d1-1\\x2d1.6-1\\x2d1.6.3.device" for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
Apr 04 10:17:20 psonar1.ijclab.in2p3.fr component-systemd[1266]: get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device defined. (Forgot to update cache?)
ok, my next guess is that the device uses some utf8 chars in the device name, and the regex doesn't match it because it's not properly dealing with utf8. can you locate the file Systemd/Service/Unit.pm
and add use utf8;
after the use 5.10.1
; and see if this works?
@stdweird unfortunately it doesn't help. But I think you are right: the name contains some hexadecimal characters that may be a unicode one. Doing systemctl|grep usb
, it seems to be the Virtual NIC created for the management port/card of the server:
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3-1\x2d1.6.3:1.0-net-idrac.device loaded active plugged iDRAC Virtual NIC
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device loaded active plugged iDRAC Virtual NIC
According to https://www.compart.com/en/unicode/U+02D1, it may be a "half triangular colon"...
@jouvin hmm, next try: replace the use 5.10.1
with use 5.12
(you can try with or without use utf8
, but that should not matter)
@jouvin or do a systemctl show strangeunit.device > output
and mail me that. i'll have a look what i can do to make it work
ok, next guess: ithas nothing to do with utf8
there is a method in Unit.pm called _handle_bug_wrong_escaped_unit. it does somethign similar and i think it needs to be extended with support for \x2d
:
[root@test2819 ~]# systemd-escape '-'
\x2d
[root@test2819 ~]# systemd-escape -u 'sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device'
sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.3.device
[root@test2819 ~]# systemd-escape 'sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.6/1-1.6.3.device'
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device
so in that method add last line
...
my $newid = join("\\", split(/\\x5c/, $id));
$newid = join("-", split(/\\x2d1/, $newid)); # add this line
@stdweird still not working unfortunately: no message about the wrong escaping found in component-systemd.log
. My guess is that the unit name has the \x2d1
characters and thus the test in the test in the method _handle_bug_wrong_escaped_unit
before doing the escaping doesn't match (unit name different from the id). What about doing the systemd-escape -u
for each unit? It is harmless if there is not escaped characters...
@stdweird I have been busy deploying our first EL9 systems and had no time to troubleshhot more this problem and come with a fix... I can only say that I started to deploy servers from a different vendor (HP) where the problem doesn't appear... Seems somewhat HW-related...
@jouvin i just tried to setup idrac with virtual media attached. i see bunch of devices pop up in dmesg, but nothing going wrong in systemd units.
if you can mail me the ouptut of systemctl show strangeunit.device > output
, i'll be able to investiagte further
@stdweird here it is: idrac_unit.out.txt
can you also do
systemctl list-units | grep pci-device
systemctl list-units | grep pci-device | cat -v
and paste that here. in the Id
in the output, there is no escaping; i guess that is the issue.
@stdweird here it is:
[root@quattorsrv ~]# systemctl list-units | grep sys-devices-pci0000:00-0000:00:1a.0-usb1-1
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3-1\x2d1.6.3:1.0-net-idrac.device loaded active plugged iDRAC Virtual NIC
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device loaded active plugged iDRAC Virtual NIC
[root@quattorsrv ~]# systemctl list-units | grep sys-devices-pci0000:00-0000:00:1a.0-usb1-1|cat -v
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3-1\x2d1.6.3:1.0-net-idrac.device loaded active plugged iDRAC Virtual NIC
sys-devices-pci0000:00-0000:00:1a.0-usb1-1\x2d1-1\x2d1.6-1\x2d1.6.3.device loaded active plugged iDRAC Virtual NIC
all other characters which are not ASCII alphanumerics, ":", "_" or "." are replaced by C-style "\x2d" escapes.
e.g.
> systemd-escape 'abc/123'
abc-123
> systemd-escape 'abc:123'
abc:123
> systemd-escape 'abc-123'
abc\x2d123
> systemd-escape 'abc#123'
abc\x23123
> systemd-escape 'abc?123'
abc\x3f123
> systemd-escape 'abc^123'
abc\x5e123
Why we don't use systemd-escape
to process the names we receive from systemd
. I gave it a try in my original tests but failed to complete the change... The unescape function in the component mentions in the comments that it could be an approach...
for reference, we also get a similar error while configuring qemu-guest-agent in systemd:
2024/04/25-13:12:59 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-units
2024/04/25-13:12:59 [VERB] Getting output of command: /usr/bin/systemctl --all --no-pager --no-legend --full list-unit-files
2024/04/25-13:13:00 [VERB] Getting output of command: /usr/bin/systemctl --no-pager --all show -- sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device
2024/04/25-13:13:00 [ERROR] Found alias "sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\\x2dports-vport3p1.device" for unit sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/25-13:13:00 [VERB] Getting output of command: /usr/bin/systemctl --no-pager --all show -- dev-virtio\x2dports-org.qemu.guest_agent.0.device
2024/04/25-13:13:00 [ERROR] Found alias "dev-virtio\\x2dports-org.qemu.guest_agent.0.device" for unit dev-virtio\x2dports-org.qemu.guest_agent.0.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
2024/04/25-13:13:00 [VERB] make_cache_alias completed with 358 cached units 46 alias units
2024/04/25-13:13:00 [ERROR] get_unit_show: no alias for unit dev-virtio\x2dports-org.qemu.guest_agent.0.device defined. (Forgot to update cache?)
2024/04/25-13:13:00 [VERB] Undefined UnitFileState for unit dev-virtio\x2dports-org.qemu.guest_agent.0.device
2024/04/25-13:13:00 [ERROR] get_unit_show: no alias for unit sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device defined. (Forgot to update cache?)
2024/04/25-13:13:00 [VERB] Undefined UnitFileState for unit sys-devices-pci0000:00-0000:00:06.0-virtio3-virtio\x2dports-vport3p1.device
not sure this is the same. output of command and is same one el8 and el9
# /usr/bin/systemctl --no-pager --all show -- dev-virtio\x2dports-org.qemu.guest_agent.0.device|grep -e 'Names=\|Id='
Id=dev-virtiox2dports-org.qemu.guest_agent.0.device
Names=dev-virtiox2dports-org.qemu.guest_agent.0.device
this smells like another escaping bug, you clearly see here that the backslash from \x2d
from the command line unitname is not in the output of systemctl show
Id
or Names
. so the component is confused
i also spotted another bug: the regex of teh list-units parser needs the extra (?:(?:.|[?]{3})\s)?
Ouptut from /usr/bin/systemctl --all --no-pager --no-legend --full list-units does not match pattern (?^:^(?:.\s)?(?<name>(?<shortname>\S+)\.(?<type>\w+))\s+(?<
loaded>\S+)\s+(?<active>\S+)\s+(?<running>\S+)(?:\s+|$)): ??? syslog.target
(that is with partial fix)
A colleague at GRIF had the same error as @wdpypere ... I think it is the same escaping issue...
we have also hit this issue now. seeing the following
2024/06/03-12:26:11 [ERROR] Found alias "dev-disk-by\\x2duuid-be3848db\\x2d217a\\x2d4574\\x2d80e0\\x2d67813f0d4407.device" for unit dev-disk-by\x2duuid-be3848db\x2d217a\x2d4574\x2d80e0\x2d67813f0d4407.device that doesn't match expected pattern '^(.*)\.(automount|device|mount|path|scope|service|slice|snapshot|socket|swap|target|timer)$'. Skipping.
Are we close to any fix yet?
@aka7 as for me, I had no time to troubleshoot more this issue, I'd to switch on more urgent issues... but I'm still interested by a fix!
I'm trying to install an EL9 server (Alma 9.3) and when running
ncm-systemd
, I get the following errors:When looking at the details in
component/systemd.log
, I find:Environement used:
ncm-systemd-23.6.0-1.noarch
)