martinpitt / umockdev

Mock hardware devices for creating unit tests and bug reporting
https://launchpad.net/umockdev
GNU Lesser General Public License v2.1
308 stars 55 forks source link

Properties not visible to libudev since systemd database changes #164

Closed hadess closed 2 years ago

hadess commented 2 years ago

It took me a long while to figure out, but sd-device, which is the backend for libudev in systemd, barely reads uevent files ever anymore (if at all). The devices were marked as "sealed" every time device_read_uevent_file() was called. I think this has to do with the changes made to systemd to counteract the ABI change in uevents in the kernel, with more data being cached.

The use case is https://gitlab.gnome.org/GNOME/gnome-control-center/-/merge_requests/1090 where I try to hotplug a UPS while a real upowerd is running. upowerd doesn't manage to detect the UPS because the UPOWER_* properties aren't accessible.

The smallest reproducer for the bug that I could find is:

diff --git a/tests/test-umockdev.py b/tests/test-umockdev.py
index 58d8bab..c42daa8 100644
--- a/tests/test-umockdev.py
+++ b/tests/test-umockdev.py
@@ -134,10 +134,12 @@ class Testbed(unittest.TestCase):
         def on_uevent(client, action, device, counters):
             if action == 'add':
                 counters[0] += 1
+                self.assertEqual(device.get_property('ID_INPUT'), '1')
             elif action == 'remove':
                 counters[1] += 1
             else:
                 assert action == 'change'
+                self.assertEqual(device.get_property('ID_INPUT'), '1')
                 counters[2] += 1

             counters[3] = device.get_sysfs_path()

or even better, so as to not send an add uevent twice:

diff --git a/tests/test-umockdev.py b/tests/test-umockdev.py
index 58d8bab..5f991ac 100644
--- a/tests/test-umockdev.py
+++ b/tests/test-umockdev.py
@@ -134,25 +134,26 @@ class Testbed(unittest.TestCase):
         def on_uevent(client, action, device, counters):
             if action == 'add':
                 counters[0] += 1
+                self.assertEqual(device.get_property('ID_INPUT'), '1')
             elif action == 'remove':
                 counters[1] += 1
             else:
                 assert action == 'change'
+                self.assertEqual(device.get_property('ID_INPUT'), '1')
                 counters[2] += 1

             counters[3] = device.get_sysfs_path()

-        syspath = self.testbed.add_device('pci', 'mydev', None, ['idVendor', '0815'], ['ID_INPUT', '1'])
-        self.assertNotEqual(syspath, None)
-
         # set up listener for uevent signal
         client = GUdev.Client.new(['pci'])
         client.connect('uevent', on_uevent, counter)

+        syspath = self.testbed.add_device('pci', 'mydev', None, ['idVendor', '0815'], ['ID_INPUT', '1'])
+        self.assertNotEqual(syspath, None)
+
         mainloop = GLib.MainLoop()

-        # send a signal and run main loop for 0.5 seconds to catch it
-        self.testbed.uevent(syspath, 'add')
+        # run main loop for 0.5 seconds to catch the add uevent
         GLib.timeout_add(500, mainloop.quit)
         mainloop.run()
         self.assertEqual(counter, [1, 0, 0, syspath])
martinpitt commented 2 years ago

Thanks @hadess! Ugh, that's a major bug.. I'll look at it ASAP. I created a reproducer test for now, and as expected it fails everywhere except Alpine (with musl).

as to not send an add uevent twice:

I'd like to keep that test as is: It's specifically meant to test the self.testbed.uevent(syspath, 'add') call. t_testbed_add_devicev() (and others) already cover the "implicit" add event from add_device().

hadess commented 2 years ago

I'd like to keep that test as is: It's specifically meant to test the self.testbed.uevent(syspath, 'add') call. t_testbed_add_devicev() (and others) already cover the "implicit" add event from add_device().

That's fine. It was easier for me to modify an existing test than to create a new one. Took me more than a day to "find" the bug.

martinpitt commented 2 years ago

I checked current systemd's code path: udev_device_get_property_value()sd_device_get_property_value(), and from then on pretty much everything happens inside sd-device.c: sd_device_get_property_value()device_properties_prepare()device_read_uevent_file(). So that bit is still as expected, but the latter immediately exits on device->uevent_loaded || device->sealed as you mentioned. uevent_loaded is just a function-internal cache flag that it only ever runs the function once.

At first sight we can ignore the "sealed" code paths -- device_seal() is only called from udevadm-test.c, and from device_new_from_{strv,nulstr}, and some copy constructor.

So after the first review I didn't yet see the root cause. Needs building a debug version and go through the code with a fine comb (and gdb).

hadess commented 2 years ago

Patches which could be useful, for setting SYSTEMD_LOG_LEVEL=debug. See also https://github.com/systemd/systemd/pull/21761

libgudev:

diff --git a/gudev/gudevclient.c b/gudev/gudevclient.c
index d96283a..46c6623 100644
--- a/gudev/gudevclient.c
+++ b/gudev/gudevclient.c
@@ -178,6 +178,10 @@ g_udev_client_constructed (GObject *object)

   client->priv->udev = udev_new ();

+  /* From sys/syslog.h */
+#define LOG_DEBUG 7
+  udev_set_log_priority (client->priv->udev, LOG_DEBUG);
+
   /* connect to event source */
   client->priv->monitor = udev_monitor_new_from_netlink (client->priv->udev, "udev");

systemd:

diff --git a/src/libudev/libudev.c b/src/libudev/libudev.c
index 73574874ca..d7cfdb31c0 100644
--- a/src/libudev/libudev.c
+++ b/src/libudev/libudev.c
@@ -150,5 +150,8 @@ _public_ int udev_get_log_priority(struct udev *udev) {
  *
  **/
 _public_ void udev_set_log_priority(struct udev *udev, int priority) {
+        log_set_target(LOG_TARGET_AUTO);
+        log_parse_environment();
+        log_set_open_when_needed(true);
         log_set_max_level(priority);
 }

Update from Martin: I tried the systemd patch, doesn't seem to work for me. But gdb/adding log_warning() etc. is fine.

hadess commented 2 years ago

This is when the device gets sealed:

#0  device_verify (device=0x5555557d9210) at ../../../../Projects/jhbuild/systemd/src/libsystemd/sd-device/device-private.c:410
#1  0x00007fffe8efd056 in device_new_from_nulstr (ret=ret@entry=0x7fffffff8068, nulstr=0x7fffffff81b8 "ACTION", len=len@entry=57) at ../../../../Projects/jhbuild/systemd/src/libsystemd/sd-device/device-private.c:504
#2  0x00007fffe8efb017 in device_monitor_receive_device (m=0x55555579b560, ret=ret@entry=0x7fffffffa1e0) at ../../../../Projects/jhbuild/systemd/src/libsystemd/sd-device/device-monitor.c:510
#3  0x00007fffe8ef7658 in udev_monitor_receive_sd_device (ret=<optimized out>, udev_monitor=<optimized out>) at ../../../../Projects/jhbuild/systemd/src/libudev/libudev-monitor.c:202
#4  udev_monitor_receive_device (udev_monitor=0x5555557efe30) at ../../../../Projects/jhbuild/systemd/src/libudev/libudev-monitor.c:247
#5  0x00007fffe8ee3aa3 in monitor_event (source=<optimized out>, condition=<optimized out>, data=0x5555557e9e70) at ../../../../Projects/jhbuild/libgudev/gudev/gudevclient.c:84
#6  0x00007fffe9d7c130 in g_main_context_dispatch () at /lib64/libglib-2.0.so.0
#7  0x00007fffe9dd1208 in g_main_context_iterate.constprop () at /lib64/libglib-2.0.so.0
#8  0x00007fffe9d7b853 in g_main_loop_run () at /lib64/libglib-2.0.so.0

which means it's sealed as soon as it's parsed the contents of the uevent sent over the socket, rather than after it's read the uevent file once.

hadess commented 2 years ago

Update from Martin: I tried the systemd patch, doesn't seem to work for me. But gdb/adding log_warning() etc. is fine.

You need both to get the logging working.

Update: For the gudev test, yes; I'm debugging the libudev test from the C code, gudev isn't involved there.

hadess commented 2 years ago

Update: For the gudev test, yes; I'm debugging the libudev test from the C code, gudev isn't involved there.

OK, so you'll need to call udev_set_log_priority() from somewhere too. Also, would be best to avoid replying inside my comments, it just looks weird...

martinpitt commented 2 years ago

Heh, sorry, I wanted to avoid generating more github notifications, these were just meant as notes for myself. (Also: I'm stuck in meetings all afternoon, will dive into this in earnest tomorrow morning)

martinpitt commented 2 years ago

Note for tomorrow: To fix this we need to include all the properties in uvent_sender_send(). We need to pass that from the caller in umockdev.vala, as that has the knowledge about where to find and read the file.

With this PoC hack the test passes:

diff --git src/uevent_sender.c src/uevent_sender.c
index 4c98bda..12384d5 100644
--- src/uevent_sender.c
+++ src/uevent_sender.c
@@ -255,6 +255,7 @@ uevent_sender_send(uevent_sender * sender, const char *devpath, const char *acti
         count += append_property(props, sizeof (props), count, "DEVNAME=", devname);
     if (devtype)
         count += append_property(props, sizeof (props), count, "DEVTYPE=", devtype);
+    count += append_property(props, sizeof (props), count, "ID_INPUT=", "1");

     /* add versioned header */
     memset(&nlh, 0x00, sizeof(struct udev_monitor_netlink_header));
hadess commented 2 years ago

That looks like my ugly diff:

diff --git a/src/uevent_sender.c b/src/uevent_sender.c
index 7bbb814..eca806d 100644
--- a/src/uevent_sender.c
+++ b/src/uevent_sender.c
@@ -254,6 +254,8 @@ uevent_sender_send(uevent_sender * sender, const char *devpath, const char *acti
         count += append_property(props, sizeof (props), count, "DEVNAME=", devname);
     if (devtype)
         count += append_property(props, sizeof (props), count, "DEVTYPE=", devtype);
+    //count += append_property(props, sizeof (props), count, "UPOWER_BATTERY_TYPE=", "ups");
+    //count += append_property(props, sizeof (props), count, "UPOWER_FAKE_DEVICE=", "1");

     /* add versioned header */
     memset(&nlh, 0x00, sizeof(struct udev_monitor_netlink_header));

But I wonder if it's there's a better way. The uevent file is supposed to only contain kernel stuff, but I guess this is a mock, so making libudev happy is pretty much the only thing we're interested in.

martinpitt commented 2 years ago

The uevent file is supposed to only contain kernel stuff

Right, and current umockdev is exactly the wrong way around -- the uevent files only contain the userland udevd properties, and the kernel stuff like DEVNAME= or DEVTYPE= is added on the fly in uevent_sender_send(). What was I thinking back then.. Let's clean this up.

But I wonder if it's there's a better way.

umockdev could store the userland properties in a /.udev.properties file, and then concatenate both when sending netlink responses. It's a trade-off whether tested programs are more likely to get confused by userland properties in /uevent or that extra /.udev.properties file (which is not a sysfs attribute). The properties could also be stored in some parallel file hierarchy, or not written to disk at all and just stored in the preload's brain. I'll think about that after fixing the above issue, which is much more urgent.

yuwata commented 2 years ago

Hi, the following may be redundant info, but

I am not familiar with umockdev, but if it emulates of eavesdrops the netlink socket, then it should send all expected properties.

hadess commented 2 years ago

The properties could also be stored in some parallel file hierarchy, or not written to disk at all and just stored in the preload's brain. I'll think about that after fixing the above issue, which is much more urgent.

That's what the /run/udev/data/ folder is used for. I would make a note somewhere (in the README?) that the uevent files contains all the properties when it should only contain the kernel ones, and insist that umockdev mocks udev, not sysfs.

martinpitt commented 2 years ago

I first had to do some yak shaving with some false-positive gcc -fanalyzer errors in the latest Fedora version, and then another gcc bug around diagnostic pragmas. With that out of the way, I sent #165 with a quick fix for that, which should unblock you.

After that I'll do the code cleanups as discussed above, they are not that urgent any more. I didn't make up my mind about mocking /run/udev/data yet -- if libudev looks at that directly, umockdev kind of has to, but if it's just udevd's internal persistent database and libudev only looks at netlink uevents (current code inspection looks that way), then things are fine -- the properties can remain in umockdev's memory, and /uevent file can look like the real kernel's.

Thanks for your help!

martinpitt commented 2 years ago

Released 0.17.4 with this, will shepherd to Fedora.