systemd / systemd

The systemd System and Service Manager
https://systemd.io
GNU General Public License v2.0
13.14k stars 3.75k forks source link

udev: provide example code for locking block device #25046

Closed jiayi0118 closed 1 year ago

jiayi0118 commented 1 year ago

systemd version the issue has been seen with

251

Used distribution

centos stream 9

Linux kernel version used

5.14.0-163.el9.x86_64

CPU architectures issue was seen on

x86_64

Component

systemd-udevd

Expected behaviour you didn't see

@poettering

As illustrated in https://systemd.io/BLOCK_DEVICE_LOCKING/ : Specifically, whenever systemd-udevd starts processing a block device it takes a LOCK_SH|LOCK_NB lock using [flock(2)](http://man7.org/linux/man-pages/man2/flock.2.html) on the main block device (i.e. never on any partition block device, but on the device the partition belongs to).

However, some mkfs like tools are more likely to manipulate a partition block device and they will take the BSD lock on the partition rather than the main block device, for example https://github.com/util-linux/util-linux/issues/1842 https://github.com/util-linux/util-linux/issues/921 https://github.com/tytso/e2fsprogs/issues/30

As far as I have seen, systemd-udevd take the BSD lock on the main block device to avoid negative race effect on superblock or partition table probing. But it also seems reasonable to just lock the partition block device rather than its main block devict for the currently manipulating commands.

I was wondering whether systemd-udevd can give a solution for partition lock detection to support such mkfs like tools. For example, let systemd-udevd take lock on the partition block device as well as the main block device, or even let systemd-udevd try to take lock on all the partition of the main block device.

Unexpected behaviour you saw

systemd-udevd and other block device tools are inconsistent on which device to take the BSD lock.

Steps to reproduce the problem

NA

Additional program output to the terminal or log subsystem illustrating the issue

NA
poettering commented 1 year ago

If you lock more than one lock then you create ABBA problems, people have to take them always in the same order.

This is not a contended case, hence finegrained locking doesn't really get you anything. Moreover, udev's partition probing always goes to the main block device anyway to find the partition uuid and partition label in addition to the fs uuid and fs label.

marcosfrm commented 1 year ago

Maybe add a sample code with udev_device_get_parent() or udev_device_get_parent_with_subsystem_devtype() to help people implementing it in mkfs-like tools?

jiayi0118 commented 1 year ago

Maybe add a sample code with udev_device_get_parent() or udev_device_get_parent_with_subsystem_devtype() to help people implementing it in mkfs-like tools?

Couldn't agree more!

hifilove commented 1 year ago

Maybe add a sample code with udev_device_get_parent() or udev_device_get_parent_with_subsystem_devtype() to help people implementing it in mkfs-like tools?

Couldn't agree more!

Couldn't agree any more

jiayi0118 commented 1 year ago

@poettering On the other hand, I think udev can cleverly avoid ABBA lock problem in finegrained locking.

The necessary conditions of ABBA deadlock are that threads occupy some locks and apply other locks blockingly.

However, in this scenary, suppose udev use nonblocking mode to acquire the locks, which is the strategy of acquiring lock in current udev, and if several locks are failed to acquire, let udev release the occupied locks at once. This will break the necessary conditions of ABBA deadlock. Besides, other block device tools, such as util-linux, tend to just keep a single lock.

I think if the currently operating device is a partition block, it will be ok that udev keeps another lock on the partition. Then udev will hold only two locks on the same block device at most, one for the main block and another for partition.

jiayi0118 commented 1 year ago

The following code is a sample to leverages sysfs mechanism to get the main block device path.

Place the following code into main.c, and run gcc -o main main.c to generate the main executable.

Run with ./main </dev/xxx> to get the main block device path of </dev/xxx>.

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <unistd.h>
#include <fcntl.h>
#include <stddef.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

// Get the base name of 'path'
// e.g. 'path' is "/dev/sda", then 'ret' is "sda"
// e.g. 'path' is "/dev/"", then 'ret' is ""
// constraint: 'size' is the capacity of 'ret' and should be larger than the base name
int GetBaseName(char *path, char *ret, int size)
{
    if (path == NULL || ret == NULL){
        return -1;
    }

    int len = strlen(path);
    char *pos = strrchr(path, '/') + 1;

    if( size > ( len - ( pos - path ) + 1 )){
        snprintf(ret, len - (pos - path) + 1, pos);
        return 0;
    }else{
        return -1;
    }
}

// Get the directory of 'path'
// e.g. path is "/dev/sda", then ret is "/dev"
// constraint: 'size' is the capacity of 'ret' and should be larger than the directory length
int GetBaseDir(char *path, char *ret, int size)
{
    if (path == NULL || ret == NULL){
        return -1;
    }

    int len = strlen(path);
    char *pos = strrchr(path, '/') + 1;

    if( size > ( pos - path ) ){
        snprintf(ret, (pos - path), path);
        return 0;
    }else {
        return -1;
    }
}

// Judge whether the device is a partition by checking if there is a 'partition' file under the sysfs path
int IsPartition(char *sysfsPath)
{
    if (sysfsPath == NULL){
        return -1;
    }

    char tmp[200] = "";
    snprintf(tmp, strlen(sysfsPath)+1, sysfsPath);
    strcat(tmp, "/partition");

    if (access(tmp, F_OK) == 0)
    {
        return 0;
    }
    else
    {
        return -1;
    }
}

// Get the main block device path of 'devicePath'.
// e.g. 'devicePath' is /dev/sda3, then 'ret' is '/dev/sda'
// e.g. 'devicePath' is /dev/sda, then 'ret' is '/dev/sda'
// constraint: 'size' is the capacity of 'ret' and should be larger than the directory length
int GetMainBlockDevicePath(char *devicePath, char *ret, int size){
    if (devicePath == NULL || ret == NULL){
        return -1;
    }

    struct stat buffer;
    int status;
    status = stat(devicePath, &buffer);

    if( status < 0 || !S_ISBLK(buffer.st_mode)){
        printf("This is not a block device node\n");
        return -1;
    }

    char base[200] = "";
    if (GetBaseName(devicePath, base, 200) < 0){
        return -1;
    }

    // get sysfs class linkage
    char sysClassBlockPath[200] = "";
    sprintf(sysClassBlockPath, "/sys/class/block/");
    strcat(sysClassBlockPath, base);

    // skip non-block device.
    if (access(sysClassBlockPath, F_OK) != 0){
        printf("Not a block\n");
        return -1;
    }

    // get the relative path from the linkage
    char sysRelativeDevicePath[200] = "";
    readlink(sysClassBlockPath, sysRelativeDevicePath, 200 * sizeof(char));

    // Get sysfs full device path
    char dir[200] = "";
    if (GetBaseDir(sysClassBlockPath, dir, 200) <0){
        return -1;
    }
    char sysDevicePath[200] = "";
    sprintf(sysDevicePath, dir);
    strcat(sysDevicePath, "/");
    strcat(sysDevicePath, sysRelativeDevicePath);

    // get the main block device path
    char mainBlockPath[200] = "";
    if(IsPartition(sysDevicePath) == 0){
        char parentDirectoryPath[200] = "";
        if( GetBaseDir(sysDevicePath, parentDirectoryPath, 200) < 0){
            return -1;
        }

        strcat(mainBlockPath,"/dev/");
        if(GetBaseName(parentDirectoryPath, mainBlockPath+5, 195) <0 ){
            return -1;
        }
    }else{
        strcat(mainBlockPath, "/dev/");
        strcat(mainBlockPath, base);
    }

    if(strlen(mainBlockPath)<size){
        sprintf(ret,mainBlockPath);
    }else{
        return -1;
    }
    return 0;
}

int main(int argc, char **argv)
{
    printf("Device: %s\n", argv[1]);

    char mainBlockPath[200] = "";
    if(GetMainBlockDevicePath(argv[1],mainBlockPath,200) == 0){
        printf("Get main block path %s\n", mainBlockPath);
    }else{
        printf("Get main block path failed\n", mainBlockPath);
    }

    return 0;
}
jiayi0118 commented 1 year ago

I have tested it in my system. Here is the block list.

# ll /sys/class/block/
lrwxrwxrwx. 1 root root 0 11月 22 10:47 dm-0 -> ../../devices/virtual/block/dm-0
lrwxrwxrwx. 1 root root 0 11月 22 10:47 dm-1 -> ../../devices/virtual/block/dm-1
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sda -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:0/block/sda
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sda1 -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:0/block/sda/sda1
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sda2 -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:0/block/sda/sda2
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sda3 -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:0/block/sda/sda3
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sr0 -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:1/block/sr0

Run with:

# ./main /dev/sda
Device: /dev/sda
Get main block path /dev/sda

# ./main /dev/sda1
Device: /dev/sda1
Get main block path /dev/sda

# ./main /dev/dm-0
Device: /dev/dm-0
Get main block path /dev/dm-0

# ./main /dev/sr0
Device: /dev/sr0
Get main block path /dev/sr0

# ./main /dev/sdb
Device: /dev/sdb
This is not a block device node
Get main block path failed

# ./main /dev/sdb1
Device: /dev/sdb1
This is not a block device node
Get main block path failed

If the device path is not a regular block device node or some bad things happened, then the function GetMainBlockDevicePath will return -1.

jiayi0118 commented 1 year ago

@poettering @yuwata

yuwata commented 1 year ago

Thank you for the suggestion, but please use sd-device. Maybe, using sd_device_new_from_devname() and sd_device_get_parent() makes the code simpler.

Then, please submit as a PR.

jiayi0118 commented 1 year ago

okay, this is just a sample for other block command to refer to.

I will submit a PR right away.

yuwata commented 1 year ago

Closed by #25499.