storaged-project / udisks

The UDisks project provides a daemon, tools and libraries to access and manipulate disks, storage devices and technologies.
http://storaged.org/doc/udisks2-api/latest/
Other
333 stars 142 forks source link

SIGSEGV when handling many uevents #99

Open tasleson opened 7 years ago

tasleson commented 7 years ago

I've seen this crash once so far. Sorry, I no longer have the core file as my /tmp was cleared on boot.

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f32e9ca029f in g_dbus_object_get_interface (object=0x0, interface_name=0x7f32e9f95d57 "org.freedesktop.UDisks2.Block")
    at gdbusobject.c:149
149   GDBusObjectIface *iface = G_DBUS_OBJECT_GET_IFACE (object);
[Current thread is 1 (Thread 0x7f32eaf47900 (LWP 11763))]
Missing separate debuginfos, use: dnf debuginfo-install gvfs-client-1.28.3-1.fc24.x86_64 iscsi-initiator-utils-6.2.0.873-33.git4c1f2d9.fc24.x86_64 kmod-libs-22-4.fc24.x86_64 libblockdev-btrfs-1.6-1.fc24.x86_64 libblockdev-kbd-1.6-1.fc24.x86_64 libblockdev-part-1.6-1.fc24.x86_64 libblockdev-swap-1.6-1.fc24.x86_64 libblockdev-utils-1.6-1.fc24.x86_64 libconfig-1.5-4.fc24.x86_64 libstoragemgmt-1.3.2-1.fc24.x86_64 libxml2-2.9.3-3.fc24.x86_64 parted-3.2-20.fc24.x86_64 yajl-2.1.0-5.fc24.x86_64
(gdb) bt
#0  0x00007f32e9ca029f in g_dbus_object_get_interface (object=0x0, interface_name=0x7f32e9f95d57 "org.freedesktop.UDisks2.Block")
    at gdbusobject.c:149
#1  0x00007f32e9f897f9 in udisks_object_peek_block (object=0x0) at udisks-generated.c:32916
#2  0x0000000000416fe1 in udisks_daemon_find_block (daemon=0x1547130 [UDisksDaemon], block_device_number=64774) at udisksdaemon.c:1091
#3  0x00007f32d55b2e92 in is_recorded_as_physical_volume (daemon=0x1547130 [UDisksDaemon], device=0x474d030 [UDisksLinuxDevice])
    at udiskslvm2moduleiface.c:260
#4  0x00007f32d55b2eff in lvm2_object_new (daemon=0x1547130 [UDisksDaemon], device=0x474d030 [UDisksLinuxDevice]) at udiskslvm2moduleiface.c:277
#5  0x000000000041a79c in handle_block_uevent_for_modules (provider=0x1543230 [UDisksLinuxProvider], action=0x4ab0020 "remove", device=0x474d030 [UDisksLinuxDevice]) at udiskslinuxprovider.c:1175
#6  0x000000000041a9c0 in handle_block_uevent (provider=0x1543230 [UDisksLinuxProvider], action=0x4ab0020 "remove", device=0x474d030 [UDisksLinuxDevice]) at udiskslinuxprovider.c:1229
#7  0x000000000041ab25 in udisks_linux_provider_handle_uevent (provider=0x1543230 [UDisksLinuxProvider], action=0x4ab0020 "remove", device=0x474d030 [UDisksLinuxDevice]) at udiskslinuxprovider.c:1277
#8  0x0000000000418442 in on_idle_with_probed_uevent (user_data=0x153e780) at udiskslinuxprovider.c:218
#9  0x00007f32e969d6ba in g_main_context_dispatch (context=0x153ed70) at gmain.c:3154
#10 0x00007f32e969d6ba in g_main_context_dispatch (context=context@entry=0x153ed70) at gmain.c:3769
#11 0x00007f32e969da70 in g_main_context_iterate (context=0x153ed70, block=block@entry=1, dispatch=dispatch@entry=1, self=<optimized out>)
    at gmain.c:3840
#12 0x00007f32e969dd92 in g_main_loop_run (loop=0x153eca0) at gmain.c:4034
#13 0x0000000000414c07 in main (argc=2, argv=0x7ffe4fa2d698) at main.c:180
tasleson commented 7 years ago

I haven't been able to re-create this one, but from the stack trace we can see:

UDisksObject *
udisks_daemon_find_block (UDisksDaemon *daemon,
                          dev_t         block_device_number)
{
  UDisksObject *ret = NULL;
  GList *objects, *l;

  objects = g_dbus_object_manager_get_objects (G_DBUS_OBJECT_MANAGER (daemon->object_manager));
  for (l = objects; l != NULL; l = l->next)
    {
      UDisksObject *object = UDISKS_OBJECT (l->data);  // l->data is not valid as object == NULL!
      UDisksBlock *block;

      block = udisks_object_peek_block (object); //OBJECT IS NULL
      if (block == NULL)
        continue;

      if (udisks_block_get_device_number (block) == block_device_number)
        {
          ret = g_object_ref (object);
          goto out;
        }
    }
 out:
  g_list_free_full (objects, g_object_unref);
  return ret;
}

Some how we have an entry being returned from the object manager where the data item has been freed. Appears to be another case of a lifetime issue or a possible race condition if we have multiple threads somehow mucking with the object manager.

tbzatek commented 5 years ago

Looking at GDBusObjectManagerServer sources, their internal manager->priv->map_object_path_to_data hashtable which the list is built from seems properly locked on any write access. The objects we get are actually nested struct members, still there are several operations performed on them and I bet you'd see a lot of critical warnings before the crash. It's also guarded with g_return_if_fail (G_IS_DBUS_OBJECT (object)) so I have no idea how this could happen. The GDBus sources look correct.

Anyway, I haven't seen this issue for over a year now. The best thing we can do here is to skip the NULL elements with serious warning printed out. This udisks_daemon_find_block() method is about finding objects, let's pretend we didn't find one.

tbzatek commented 5 years ago

I'm still not convinced we want any preventive fix now as long as we haven't found the root cause. The g_dbus_object_manager_get_objects() is used quite often at number of places (possibly chained via udisks_daemon_get_objects()). The related object unref over the whole list would cause similar issues and making it NULL-aware would result in nasty code.

Also, typecast via UDISKS_OBJECT() should not return NULL from a valid pointer even if the cast is invalid.

Let's keep this issue open and watch out for any occurrences.