Closed jiayi0118 closed 1 year ago
If you lock more than one lock then you create ABBA problems, people have to take them always in the same order.
This is not a contended case, hence finegrained locking doesn't really get you anything. Moreover, udev's partition probing always goes to the main block device anyway to find the partition uuid and partition label in addition to the fs uuid and fs label.
Maybe add a sample code with udev_device_get_parent()
or udev_device_get_parent_with_subsystem_devtype()
to help people implementing it in mkfs-like tools?
Maybe add a sample code with
udev_device_get_parent()
orudev_device_get_parent_with_subsystem_devtype()
to help people implementing it in mkfs-like tools?
Couldn't agree more!
Maybe add a sample code with
udev_device_get_parent()
orudev_device_get_parent_with_subsystem_devtype()
to help people implementing it in mkfs-like tools?Couldn't agree more!
Couldn't agree any more
@poettering On the other hand, I think udev can cleverly avoid ABBA lock problem in finegrained locking.
The necessary conditions of ABBA deadlock are that threads occupy some locks and apply other locks blockingly.
However, in this scenary, suppose udev use nonblocking mode to acquire the locks, which is the strategy of acquiring lock in current udev, and if several locks are failed to acquire, let udev release the occupied locks at once. This will break the necessary conditions of ABBA deadlock. Besides, other block device tools, such as util-linux, tend to just keep a single lock.
I think if the currently operating device is a partition block, it will be ok that udev keeps another lock on the partition. Then udev will hold only two locks on the same block device at most, one for the main block and another for partition.
The following code is a sample to leverages sysfs
mechanism to get the main block device path.
Place the following code into main.c, and run gcc -o main main.c
to generate the main
executable.
Run with ./main </dev/xxx>
to get the main block device path of </dev/xxx>
.
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <unistd.h>
#include <fcntl.h>
#include <stddef.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
// Get the base name of 'path'
// e.g. 'path' is "/dev/sda", then 'ret' is "sda"
// e.g. 'path' is "/dev/"", then 'ret' is ""
// constraint: 'size' is the capacity of 'ret' and should be larger than the base name
int GetBaseName(char *path, char *ret, int size)
{
if (path == NULL || ret == NULL){
return -1;
}
int len = strlen(path);
char *pos = strrchr(path, '/') + 1;
if( size > ( len - ( pos - path ) + 1 )){
snprintf(ret, len - (pos - path) + 1, pos);
return 0;
}else{
return -1;
}
}
// Get the directory of 'path'
// e.g. path is "/dev/sda", then ret is "/dev"
// constraint: 'size' is the capacity of 'ret' and should be larger than the directory length
int GetBaseDir(char *path, char *ret, int size)
{
if (path == NULL || ret == NULL){
return -1;
}
int len = strlen(path);
char *pos = strrchr(path, '/') + 1;
if( size > ( pos - path ) ){
snprintf(ret, (pos - path), path);
return 0;
}else {
return -1;
}
}
// Judge whether the device is a partition by checking if there is a 'partition' file under the sysfs path
int IsPartition(char *sysfsPath)
{
if (sysfsPath == NULL){
return -1;
}
char tmp[200] = "";
snprintf(tmp, strlen(sysfsPath)+1, sysfsPath);
strcat(tmp, "/partition");
if (access(tmp, F_OK) == 0)
{
return 0;
}
else
{
return -1;
}
}
// Get the main block device path of 'devicePath'.
// e.g. 'devicePath' is /dev/sda3, then 'ret' is '/dev/sda'
// e.g. 'devicePath' is /dev/sda, then 'ret' is '/dev/sda'
// constraint: 'size' is the capacity of 'ret' and should be larger than the directory length
int GetMainBlockDevicePath(char *devicePath, char *ret, int size){
if (devicePath == NULL || ret == NULL){
return -1;
}
struct stat buffer;
int status;
status = stat(devicePath, &buffer);
if( status < 0 || !S_ISBLK(buffer.st_mode)){
printf("This is not a block device node\n");
return -1;
}
char base[200] = "";
if (GetBaseName(devicePath, base, 200) < 0){
return -1;
}
// get sysfs class linkage
char sysClassBlockPath[200] = "";
sprintf(sysClassBlockPath, "/sys/class/block/");
strcat(sysClassBlockPath, base);
// skip non-block device.
if (access(sysClassBlockPath, F_OK) != 0){
printf("Not a block\n");
return -1;
}
// get the relative path from the linkage
char sysRelativeDevicePath[200] = "";
readlink(sysClassBlockPath, sysRelativeDevicePath, 200 * sizeof(char));
// Get sysfs full device path
char dir[200] = "";
if (GetBaseDir(sysClassBlockPath, dir, 200) <0){
return -1;
}
char sysDevicePath[200] = "";
sprintf(sysDevicePath, dir);
strcat(sysDevicePath, "/");
strcat(sysDevicePath, sysRelativeDevicePath);
// get the main block device path
char mainBlockPath[200] = "";
if(IsPartition(sysDevicePath) == 0){
char parentDirectoryPath[200] = "";
if( GetBaseDir(sysDevicePath, parentDirectoryPath, 200) < 0){
return -1;
}
strcat(mainBlockPath,"/dev/");
if(GetBaseName(parentDirectoryPath, mainBlockPath+5, 195) <0 ){
return -1;
}
}else{
strcat(mainBlockPath, "/dev/");
strcat(mainBlockPath, base);
}
if(strlen(mainBlockPath)<size){
sprintf(ret,mainBlockPath);
}else{
return -1;
}
return 0;
}
int main(int argc, char **argv)
{
printf("Device: %s\n", argv[1]);
char mainBlockPath[200] = "";
if(GetMainBlockDevicePath(argv[1],mainBlockPath,200) == 0){
printf("Get main block path %s\n", mainBlockPath);
}else{
printf("Get main block path failed\n", mainBlockPath);
}
return 0;
}
I have tested it in my system. Here is the block list.
# ll /sys/class/block/
lrwxrwxrwx. 1 root root 0 11月 22 10:47 dm-0 -> ../../devices/virtual/block/dm-0
lrwxrwxrwx. 1 root root 0 11月 22 10:47 dm-1 -> ../../devices/virtual/block/dm-1
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sda -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:0/block/sda
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sda1 -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:0/block/sda/sda1
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sda2 -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:0/block/sda/sda2
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sda3 -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:0/block/sda/sda3
lrwxrwxrwx. 1 root root 0 11月 22 10:47 sr0 -> ../../devices/pci0000:00/0000:00:02.0/0000:04:00.0/virtio2/host0/target0:0:0/0:0:0:1/block/sr0
Run with:
# ./main /dev/sda
Device: /dev/sda
Get main block path /dev/sda
# ./main /dev/sda1
Device: /dev/sda1
Get main block path /dev/sda
# ./main /dev/dm-0
Device: /dev/dm-0
Get main block path /dev/dm-0
# ./main /dev/sr0
Device: /dev/sr0
Get main block path /dev/sr0
# ./main /dev/sdb
Device: /dev/sdb
This is not a block device node
Get main block path failed
# ./main /dev/sdb1
Device: /dev/sdb1
This is not a block device node
Get main block path failed
If the device path is not a regular block device node or some bad things happened, then the function GetMainBlockDevicePath
will return -1.
@poettering @yuwata
Thank you for the suggestion, but please use sd-device. Maybe, using sd_device_new_from_devname()
and sd_device_get_parent()
makes the code simpler.
Then, please submit as a PR.
okay, this is just a sample for other block command to refer to.
I will submit a PR right away.
Closed by #25499.
systemd version the issue has been seen with
251
Used distribution
centos stream 9
Linux kernel version used
5.14.0-163.el9.x86_64
CPU architectures issue was seen on
x86_64
Component
systemd-udevd
Expected behaviour you didn't see
@poettering
As illustrated in https://systemd.io/BLOCK_DEVICE_LOCKING/ :
Specifically, whenever systemd-udevd starts processing a block device it takes a LOCK_SH|LOCK_NB lock using [flock(2)](http://man7.org/linux/man-pages/man2/flock.2.html) on the main block device (i.e. never on any partition block device, but on the device the partition belongs to).
However, some mkfs like tools are more likely to manipulate a partition block device and they will take the BSD lock on the partition rather than the main block device, for example https://github.com/util-linux/util-linux/issues/1842 https://github.com/util-linux/util-linux/issues/921 https://github.com/tytso/e2fsprogs/issues/30
As far as I have seen, systemd-udevd take the BSD lock on the main block device to avoid negative race effect on superblock or partition table probing. But it also seems reasonable to just lock the partition block device rather than its main block devict for the currently manipulating commands.
I was wondering whether systemd-udevd can give a solution for partition lock detection to support such mkfs like tools. For example, let systemd-udevd take lock on the partition block device as well as the main block device, or even let systemd-udevd try to take lock on all the partition of the main block device.
Unexpected behaviour you saw
systemd-udevd and other block device tools are inconsistent on which device to take the BSD lock.
Steps to reproduce the problem
NA
Additional program output to the terminal or log subsystem illustrating the issue