legionus / kbd

Mirror of https://git.kernel.org/pub/scm/linux/kernel/git/legion/kbd.git
https://kbd-project.org
Other
85 stars 40 forks source link

(Apparent) repeated 'Input/output error' messages from loadkeys #92

Closed keszybz closed 1 year ago

keszybz commented 1 year ago

We're trying to debug a failure of systemd-vconsole-setup in a qemu boot. In some boots (apparently randomly), we get the output below. The sequence is of invocations is: pid1 → systemd-udevd → udev worker → /usr/lib/systemd/systemd-vconsole-setuploadkeys. The worker captures output from the workers and adds the prefix.

vtcon0: /usr/lib/udev/rules.d/90-vconsole.rules:12 RUN '/usr/lib/systemd/systemd-vconsole-setup'
vtcon0: Running command "/usr/lib/systemd/systemd-vconsole-setup"
vtcon0: Starting '/usr/lib/systemd/systemd-vconsole-setup'
vtcon0: '/usr/lib/systemd/systemd-vconsole-setup'(err) 'Found cgroup2 on /sys/fs/cgroup/, full unified hierarchy'
vtcon0: '/usr/lib/systemd/systemd-vconsole-setup'(err) 'Found container virtualization none.'
vtcon0: '/usr/lib/systemd/systemd-vconsole-setup'(err) 'Sysfs UTF-8 flag enabled'
vtcon0: '/usr/lib/systemd/systemd-vconsole-setup'(err) 'UTF-8 kbdmode enabled on /dev/tty1'
vtcon0: '/usr/lib/systemd/systemd-vconsole-setup'(err) 'Executing "/usr/bin/loadkeys -q -C /dev/tty1 -u us"...'
vtcon0: '/usr/lib/systemd/systemd-vconsole-setup'(err) 'Successfully forked off '(loadkeys)' as PID 520.'
vtcon0: '/usr/lib/systemd/systemd-vconsole-setup'(err) 'Input/output error'
vtcon0: '/usr/lib/systemd/systemd-vconsole-setup'(err) 'Input/output error'
...
(about 4k of those)
...
vtcon0: '/usr/lib/systemd/systemd-vconsole-setup'(err) 'Input/output error'
vtcon0: Process '/usr/lib/systemd/systemd-vconsole-setup' failed with exit code 1.
vtcon0: Command "/usr/lib/systemd/systemd-vconsole-setup" returned 1 (error), ignoring.

The first step would to confirm that the 'Input/output error' message is really from loadkeys, and if yes, why does it happen and can we make loadkeys report it just once, not ~4k times.

See https://github.com/systemd/systemd/issues/26908 for the original issue.

kbd-2.5.1-5.fc38.x86_64 systemd-253-2165-g761ac130bd (compiled from git)

legionus commented 1 year ago

@keszybz The loadkeys works. It means you need to call it properly from udev/systemd.

legionus commented 1 year ago

I also want to add that loadkeys does not load the keymap in a loop (no retry). This means that 4k messages appeared due to the fact that loadkeys is called by someone in some kind of infinite loop.

keszybz commented 1 year ago

Please see the attached strace log. The interesting part:

execve("/usr/bin/loadkeys", ["/usr/bin/loadkeys", "-q", "-C", "/dev/tty1", "-u", "us"], 0x7ffceb876770 /* 7 vars */) = 0
openat(AT_FDCWD, "/dev/tty1", O_RDWR)   = 3
ioctl(3, KDSKBENT, {kb_table=K_NORMTAB, kb_index=1, kb_value=0x1b /* K(KT_LATIN, '\x1b') */}) = -1 EIO (Input/output error)
write(2, "Input/output error", 18)      = 18
write(2, "\n", 1)                       = 1
ioctl(3, KDSKBENT, {kb_table=K_NORMTAB, kb_index=2, kb_value=0x31 /* K(KT_LATIN, '1') */}) = -1 EIO (Input/output error)
write(2, "Input/output error", 18)      = 18
write(2, "\n", 1)                       = 1
ioctl(3, KDSKBENT, {kb_table=K_NORMTAB, kb_index=3, kb_value=0x32 /* K(KT_LATIN, '2') */}) = -1 EIO (Input/output error)
write(2, "Input/output error", 18)      = 18
write(2, "\n", 1)                       = 1
ioctl(3, KDSKBENT, {kb_table=K_NORMTAB, kb_index=4, kb_value=0x33 /* K(KT_LATIN, '3') */}) = -1 EIO (Input/output error)
write(2, "Input/output error", 18)      = 18
write(2, "\n", 1)                       = 1
...

strace.527.txt

legionus commented 1 year ago

Yes, it's an upload keymap. The keymap is added to the kernel one symbol at a time. Most likely the error came from here: https://github.com/legionus/kbd/blob/master/src/libkeymap/loadkeys.c#L60 . I agree that there is no EIO handling and possibly other error codes, but that doesn't change the fact that /dev/tty1 isn't applicable.

legionus commented 1 year ago

Hm. After a quick search, I did not find in the ioctl documentation that it can return EIO.

legionus commented 1 year ago

In other cases, it is dangerous to stop because there is a risk of leaving the console in an unusable state. But if an EIO error occurs, then there is clearly no point in continuing.

keszybz commented 1 year ago

I'm not sure why the kernel returns this error. But I don't think there's anything wrong with /dev/tty1 or the call. When the same call is executed after the system has started, it works just fine.

udevadm trigger -c add /sys/class/vtconsole/vtcon0 triggers the same operation, and in that case the strace looks like this:

execve("/usr/bin/loadkeys", ["/usr/bin/loadkeys", "-q", "-C", "/dev/tty1", "-u", "us"], 0x7ffef77ef2c0 /* 7 vars */) = 0
openat(AT_FDCWD, "/dev/tty1", O_RDWR)   = 3
ioctl(3, KDSKBENT, {kb_table=K_NORMTAB, kb_index=1, kb_value=0x1b /* K(KT_LATIN, '\x1b') */}) = 0
ioctl(3, KDSKBENT, {kb_table=K_NORMTAB, kb_index=2, kb_value=0x31 /* K(KT_LATIN, '1') */}) = 0
ioctl(3, KDSKBENT, {kb_table=K_NORMTAB, kb_index=3, kb_value=0x32 /* K(KT_LATIN, '2') */}) = 0
ioctl(3, KDSKBENT, {kb_table=K_NORMTAB, kb_index=4, kb_value=0x33 /* K(KT_LATIN, '3') */}) = 0
...

It seems all identical, except that the the ioctls succeed.

Maybe this is some race condition within the kernel. I looked at the logs for some event that'd be reported after this first attempt fails, but I don't see anything relevant.

legionus commented 1 year ago

I'm not sure why the kernel returns this error.

For me, this is also unexpected.

Maybe this is some race condition within the kernel.

I don't think so. I think it's because you're trying to do this in udev. I know that udev has some restrictions on the programs it runs. From udev(7):

Note that running programs that access the network or mount/unmount filesystems is not allowed inside of udev rules, due to the default sandbox that is enforced on systemd-udevd.service.

We are probably dealing with one of these limitations.

keszybz commented 1 year ago

I don't think so. I think it's because you're trying to do this in udev. I know that udev has some restrictions on the programs it runs. From udev(7):

Note that running programs that access the network or mount/unmount filesystems is not allowed inside of udev rules, due to the default sandbox that is enforced on systemd-udevd.service.

We are probably dealing with one of these limitations.

That's unlikely. Firstly, those restrictions are about write access to the file system. loadkeys doesn't need write access or do any network operations. In fact, as the strace shows, it works fine all the way until it gets to calling ioctl. Secondly, to write the strace log, I had to allow access to the file system, and I disabled sandbox altogether; the effect is the same without the sandbox. Thirdly, when called via udevadm trigger, the same call succeeds. It is still executed via udevd.

legionus commented 1 year ago

I did not mean directly about the described restrictions. I meant that since udev restricts system calls such as mount, then maybe it has some other restrictions.

keszybz commented 1 year ago

No. Really. It just fails on the ioctl. Udev does not mess with individual ioctls.

ldv-alt commented 1 year ago

Looks like the only case when KDSKBENT could fail with EIO is when the tty line discipline has been destroyed on hangup: https://github.com/torvalds/linux/blob/master/drivers/tty/tty_io.c#L2781

legionus commented 1 year ago

@keszybz Ok. I found when ioctl can return EIO[1]. This can happen if the tty has been hungup[2] while loadkeys is running. This can be done by systemd[3]. There is this flag:

TTYVHangup=
           Disconnect all clients which have opened the terminal device specified with
           TTYPath= before and after execution. Defaults to "no".

So, any unit can do TIOCVHANGUP. For example getty[4] or debug-shell[5].

This is not a kbd or systemd bug. I believe that the bug is just a misconfiguration.

For my part, I will add code that will make EIO a fatal error, because in this case there is nothing else left. In this case, the keymap will not be fully loaded and the console may be in an unusable condition.

[1] https://github.com/torvalds/linux/blob/master/drivers/tty/tty_io.c#L2783 [2] https://github.com/torvalds/linux/blob/master/drivers/tty/tty_ldisc.c#L223 [3] https://github.com/systemd/systemd/blob/b4e2fcb6bb29775f61d78e60547814eb74be1918/src/core/execute.c#L276 [4] https://github.com/systemd/systemd/blob/b4e2fcb6bb29775f61d78e60547814eb74be1918/units/getty%40.service.in#L50 [5] https://github.com/systemd/systemd/blob/b4e2fcb6bb29775f61d78e60547814eb74be1918/units/debug-shell.service.in#L25

keszybz commented 1 year ago

@ldv-alt, @legionus thank you! I'll have a pull request for systemd shortly.

legionus commented 1 year ago

I added EIO handling as a fatal error. So now, when an EIO happens, only one error message will be displayed.

keszybz commented 1 year ago

Great, thanks! The issue should now be comprehensively solved from both sides.