storaged-project / udisks

The UDisks project provides a daemon, tools and libraries to access and manipulate disks, storage devices and technologies.
https://storaged.org/doc/udisks2-api/latest/
Other
345 stars 142 forks source link

Uniquely malformed MBR containing NTFS PBS causes udev spam and constant reads #1207

Open jarrodsfarrell opened 12 months ago

jarrodsfarrell commented 12 months ago

This issue is missing samples to reproduce as they were inadvertently destroyed. If you came here from a search then kindly engage here and supply samples before doing any partition modification as it may destroy the circumstances that would cause this bug.


/dev/sdb1 used to be a NTFS filesystem holding games at one point when Windows was the dominate OS on this desktop and later became a BTRFS filesystem by lazily pointing mkfs.btrfs at it. Importantly the games were migrated over. This has been for almost a year.

Around this month I noticed my HDD light constantly illuminated and checking iotop I saw udisksd writing to the disk at a constant 6-7 M/s. Checking udevadm monitor there was a lot of UDEV and KERNEL change events.

UDEV  [106382.274587] change   /devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1 (block)
UDEV  [106382.279426] change   /devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb (block)
KERNEL[106382.281577] change   /devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb (block)
KERNEL[106382.282234] change   /devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1 (block)
UDEV  [106382.287861] change   /devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1 (block)
UDEV  [106382.293801] change   /devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb (block)
UDEV  [106382.301564] change   /devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1 (block)
[and so on...]

I did some other debugging steps I found online, but the one that ended up showing something unusual was with strace. Scanning around the file \353R\220NTFS came up frequently during a read call. Searching around brought me to a unrelated post about someone having issues with NTFS, then searching up the NTFS structure came across it's Partition Boot Sector. This is when I discovered this disk was still MBR. Using gdisk to discard the MBR and recreate the partition table resolved the issue with the disk no longer constantly read.

Before discarding the MBR, I did make a backup of the first 2048 bytes, sdb.2048-bytes.bin.gz. And running parted against the disk before fixing produced an interesting result, included for the absurdity.

[nix-shell:/dev]# parted /dev/sdb print
Model: ATA Samsung SSD 860 (scsi)
Disk /dev/sdb: 1000GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags: 

Number  Start  End     Size    File system  Flags
 1      0.00B  1000GB  1000GB  ntfs

Reminder: /dev/sdb1 is actually a btrfs file system.

Extra

/etc/fstab declaration:

/dev/disk/by-uuid/XXXX /media/Games btrfs noatime,compress=zstd 0 0

Disk information:

Model Family:     Samsung based SSDs
Device Model:     Samsung SSD 860 EVO 1TB
Serial Number:    ----
LU WWN Device Id: 5 002538 ec0bdad6b
Firmware Version: RVT04B6Q
User Capacity:    1,000,204,886,016 bytes [1.00 TB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Form Factor:      2.5 inches
TRIM Command:     Available, deterministic, zeroed
Device is:        In smartctl database 7.3/5387
ATA Version is:   ACS-4 T13/BSR INCITS 529 revision 5
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is:    Sat Oct 21 15:33:53 2023 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

/etc/os-release:

BUG_REPORT_URL="https://github.com/NixOS/nixpkgs/issues"
BUILD_ID="23.11pre536534.ca012a02bf83"
DOCUMENTATION_URL="https://nixos.org/learn.html"
HOME_URL="https://nixos.org/"
ID=nixos
LOGO="nix-snowflake"
NAME=NixOS
PRETTY_NAME="NixOS 23.11 (Tapir)"
SUPPORT_URL="https://nixos.org/community.html"
VERSION="23.11 (Tapir)"
VERSION_CODENAME=tapir
VERSION_ID="2.11"
jarrodsfarrell commented 12 months ago

strace.log.gz

tbzatek commented 12 months ago

Please provide udevadm info for /dev/sdb and /dev/sdb1 (with the broken MBR) - initial probing is done by udev and udisks only consumes most of the info. Anything related in dmesg? Anything on stdout and stderr spewn by udisksd?

jarrodsfarrell commented 12 months ago

@tbzatek As mentioned I did make a backup of the first 2048 bytes, so for posterity I tried it on a small file attached using a loop device and could not recreate it like that. Also tried on an old USB flash drive I had laying around to similar effect.

At the moment the issue is not happening right now after I tried to return back to the previous setup as much as I can. But these are the following actions I did:

For now I'm going to leave the system back to that setup and see if the issue crops up due to time, since even with a broken MBR the system works fine. I'll report back if it starts happening again.

jarrodsfarrell commented 12 months ago

Though here's the udevadm info as requested if there's anything of interest there, realizing it still might be useful just now.

/dev/sdb:

P: /devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb
M: sdb
U: block
T: disk
D: b 8:16
N: sdb
L: 0
S: disk/by-id/ata-Samsung_SSD_860_EVO_1TB_S599NZFNB22849E
S: disk/by-path/pci-0000:01:00.1-ata-2
S: disk/by-path/pci-0000:01:00.1-ata-2.0
S: disk/by-diskseq/3
S: disk/by-id/wwn-0x5002538ec0bdad6b
Q: 3
E: DEVPATH=/devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb
E: DEVNAME=/dev/sdb
E: DEVTYPE=disk
E: DISKSEQ=3
E: MAJOR=8
E: MINOR=16
E: SUBSYSTEM=block
E: USEC_INITIALIZED=3471770
E: PATH=/nix/store/yjisihkg87ycnpj5db42s4z9xlaxrqy0-udev-path/bin:/nix/store/yjisihkg87ycnpj5db42s4z9xlaxrqy0-udev-path/sbin
E: ID_ATA=1
E: ID_TYPE=disk
E: ID_BUS=ata
E: ID_MODEL=Samsung_SSD_860_EVO_1TB
E: ID_MODEL_ENC=Samsung\x20SSD\x20860\x20EVO\x201TB\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E: ID_REVISION=RVT04B6Q
E: ID_SERIAL=Samsung_SSD_860_EVO_1TB_S599NZFNB22849E
E: ID_SERIAL_SHORT=S599NZFNB22849E
E: ID_ATA_WRITE_CACHE=1
E: ID_ATA_WRITE_CACHE_ENABLED=1
E: ID_ATA_FEATURE_SET_HPA=1
E: ID_ATA_FEATURE_SET_HPA_ENABLED=1
E: ID_ATA_FEATURE_SET_PM=1
E: ID_ATA_FEATURE_SET_PM_ENABLED=1
E: ID_ATA_FEATURE_SET_SECURITY=1
E: ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=4
E: ID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=8
E: ID_ATA_FEATURE_SET_SECURITY_FROZEN=1
E: ID_ATA_FEATURE_SET_SMART=1
E: ID_ATA_FEATURE_SET_SMART_ENABLED=1
E: ID_ATA_DOWNLOAD_MICROCODE=1
E: ID_ATA_SATA=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E: ID_ATA_ROTATION_RATE_RPM=0
E: ID_WWN=0x5002538ec0bdad6b
E: ID_WWN_WITH_EXTENSION=0x5002538ec0bdad6b
E: ID_PATH=pci-0000:01:00.1-ata-2.0
E: ID_PATH_TAG=pci-0000_01_00_1-ata-2_0
E: ID_PATH_ATA_COMPAT=pci-0000:01:00.1-ata-2
E: ID_PART_TABLE_UUID=22a22938
E: ID_PART_TABLE_TYPE=dos
E: DEVLINKS=/dev/disk/by-id/ata-Samsung_SSD_860_EVO_1TB_S599NZFNB22849E /dev/disk/by-path/pci-0000:01:00.1-ata-2 /dev/disk/by-path/pci-0000:01:00.1-ata-2.0 /dev/disk/by-diskseq/3 /dev/disk/by-id/wwn-0x5002538ec0bdad6b
E: TAGS=:systemd:
E: CURRENT_TAGS=:systemd:

/dev/sdb1:

P: /devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
M: sdb1
R: 1
U: block
T: partition
D: b 8:17
N: sdb1
L: 0
S: disk/by-diskseq/3-part1
S: disk/by-id/ata-Samsung_SSD_860_EVO_1TB_S599NZFNB22849E-part1
S: disk/by-id/wwn-0x5002538ec0bdad6b-part1
S: disk/by-path/pci-0000:01:00.1-ata-2-part1
S: disk/by-partuuid/22a22938-01
S: disk/by-uuid/9fa48288-ca5a-4300-a0cc-283f5d36265a
S: disk/by-path/pci-0000:01:00.1-ata-2.0-part1
S: disk/by-label/Games
Q: 3
E: DEVPATH=/devices/pci0000:00/0000:00:01.2/0000:01:00.1/ata2/host1/target1:0:0/1:0:0:0/block/sdb/sdb1
E: DEVNAME=/dev/sdb1
E: DEVTYPE=partition
E: DISKSEQ=3
E: PARTN=1
E: MAJOR=8
E: MINOR=17
E: SUBSYSTEM=block
E: USEC_INITIALIZED=3471792
E: PATH=/nix/store/yjisihkg87ycnpj5db42s4z9xlaxrqy0-udev-path/bin:/nix/store/yjisihkg87ycnpj5db42s4z9xlaxrqy0-udev-path/sbin
E: ID_ATA=1
E: ID_TYPE=disk
E: ID_BUS=ata
E: ID_MODEL=Samsung_SSD_860_EVO_1TB
E: ID_MODEL_ENC=Samsung\x20SSD\x20860\x20EVO\x201TB\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20\x20
E: ID_REVISION=RVT04B6Q
E: ID_SERIAL=Samsung_SSD_860_EVO_1TB_S599NZFNB22849E
E: ID_SERIAL_SHORT=S599NZFNB22849E
E: ID_ATA_WRITE_CACHE=1
E: ID_ATA_WRITE_CACHE_ENABLED=1
E: ID_ATA_FEATURE_SET_HPA=1
E: ID_ATA_FEATURE_SET_HPA_ENABLED=1
E: ID_ATA_FEATURE_SET_PM=1
E: ID_ATA_FEATURE_SET_PM_ENABLED=1
E: ID_ATA_FEATURE_SET_SECURITY=1
E: ID_ATA_FEATURE_SET_SECURITY_ENABLED=0
E: ID_ATA_FEATURE_SET_SECURITY_ERASE_UNIT_MIN=4
E: ID_ATA_FEATURE_SET_SECURITY_ENHANCED_ERASE_UNIT_MIN=8
E: ID_ATA_FEATURE_SET_SECURITY_FROZEN=1
E: ID_ATA_FEATURE_SET_SMART=1
E: ID_ATA_FEATURE_SET_SMART_ENABLED=1
E: ID_ATA_DOWNLOAD_MICROCODE=1
E: ID_ATA_SATA=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN2=1
E: ID_ATA_SATA_SIGNAL_RATE_GEN1=1
E: ID_ATA_ROTATION_RATE_RPM=0
E: ID_WWN=0x5002538ec0bdad6b
E: ID_WWN_WITH_EXTENSION=0x5002538ec0bdad6b
E: ID_PATH=pci-0000:01:00.1-ata-2.0
E: ID_PATH_TAG=pci-0000_01_00_1-ata-2_0
E: ID_PATH_ATA_COMPAT=pci-0000:01:00.1-ata-2
E: ID_PART_TABLE_UUID=22a22938
E: ID_PART_TABLE_TYPE=dos
E: ID_FS_LABEL=Games
E: ID_FS_LABEL_ENC=Games
E: ID_FS_UUID=9fa48288-ca5a-4300-a0cc-283f5d36265a
E: ID_FS_UUID_ENC=9fa48288-ca5a-4300-a0cc-283f5d36265a
E: ID_FS_UUID_SUB=652facac-6256-40f7-bd03-cbc1dd020066
E: ID_FS_UUID_SUB_ENC=652facac-6256-40f7-bd03-cbc1dd020066
E: ID_FS_BLOCKSIZE=4096
E: ID_FS_LASTBLOCK=244189696
E: ID_FS_SIZE=1000200994816
E: ID_FS_TYPE=btrfs
E: ID_FS_USAGE=filesystem
E: ID_PART_ENTRY_SCHEME=dos
E: ID_PART_ENTRY_UUID=22a22938-01
E: ID_PART_ENTRY_TYPE=0x83
E: ID_PART_ENTRY_NUMBER=1
E: ID_PART_ENTRY_OFFSET=2048
E: ID_PART_ENTRY_SIZE=1953517568
E: ID_PART_ENTRY_DISK=8:16
E: ID_BTRFS_READY=1
E: DEVLINKS=/dev/disk/by-diskseq/3-part1 /dev/disk/by-id/ata-Samsung_SSD_860_EVO_1TB_S599NZFNB22849E-part1 /dev/disk/by-id/wwn-0x5002538ec0bdad6b-part1 /dev/disk/by-path/pci-0000:01:00.1-ata-2-part1 /dev/disk/by-partuuid/22a22938-01 /dev/disk/by-uuid/9fa48288-ca5a-4300-a0cc-283f5d36265a /dev/disk/by-path/pci-0000:01:00.1-ata-2.0-part1 /dev/disk/by-label/Games
E: TAGS=:systemd:
E: CURRENT_TAGS=:systemd:
tbzatek commented 12 months ago

Thanks, the udevadm dumps look fine (e.g. the ID_FS_TYPE=btrfs). FYI, GPT is located both on start and end of the block device, simply backing up the leading 2kB doesn't work. The MBR partition table may have contained a protective partition (even with bogus boundaries, just to indicate there's something on the disk). This may have been messed up in a number of different ways so having a working reproducer is crucial.

jarrodsfarrell commented 12 months ago

@tbzatek Ah, didn't know but good to know otherwise and something I'll probably have to look into myself. I'll see if I could instead recreate the events that led to this situation by using Windows to format a NTFS disk then lazily turn the created partition into BTRFS.

jarrodsfarrell commented 12 months ago

So I went into Windows and tried to recreate the issue with a spare SSD and used HxD to look at the results, but in all the ways I could do in Disk Management did not try to add the NTFS' PBS at the MBR. Mostly the testing process was initialize the disk, add NTFS, then look at it in HxD before manually zeroing out the first handful of sectors before I'd try something else to much of Windows' annoyance. In all cases it would create a MBR stub and/or create the GPT table with nothing unusual; no NTFS PBS.

Back on the Linux side I also tried purposefully formatting the whole disk NTFS (mkfs.ntfs --quick --force /dev/sda) then create a new MBR with a partition (fdisk /dev/sda) which kept the PBS, but udevadm monitor displayed normal remove/add/change events and fdisk -l correctly reported the disk as dos instead of loop like we saw. Same when using gdisk to create a GPT partition which also kept the PBS. I also did some silly things like creating a MBR on a partition but nothing wanted to humor the partitions within partitions.

Mildly disappointed that I destroyed the only unique case of a malformed MBR disk that would cause udev spam. Oh well, at least if someone ever encounters a situation like this they would take some solace that they weren't the only one and there would be this issue to supply their working (broken?) case of udisksd misbehaving.

So at this point it would seem this issue is stalled.