storaged-project / udisks

The UDisks project provides a daemon, tools and libraries to access and manipulate disks, storage devices and technologies.
https://storaged.org/doc/udisks2-api/latest/
Other
348 stars 142 forks source link

[Regression: 2.8.4->2.9.0] udisks reports wrong SMART status for an HDD that is put into standby state before udisks2.service starts #778

Open vedgy opened 4 years ago

vedgy commented 4 years ago

I have a custom systemd hdparm-on-boot.service enabled:

ExecStart=/usr/bin/hdparm -Y /dev/disk/by-id/ata-SAMSUNG_HD...
WantedBy=multi-user.target

The service starts and finishes before udisks2 service starts. Before the update from udisks 2.8.4 to 2.9.0 gnome-disk-utility correctly displayed the disk's SMART and power mode statuses. Now it doesn't: https://gitlab.gnome.org/GNOME/gnome-disk-utility/-/issues/172. udisksctl dump reports SmartEnabled=false; SmartSupported=false for this HDD. When I boot with hdparm-on-boot.service disabled, udisksctl dump reportsSmartEnabled=true; SmartSupported=true and gnome-disks-utility works correctly (as before the udisks 2.9.0 update). Putting the HDD into standby mode with the same hdparm command after udisks2.service starts does not affect the udisksctl dump output and GNOME Disks display (they remain correct). EDIT: restarting udisks2.service while the HDD is in the standby state (sudo systemctl restart udisks2) also does not affect the correctness of udisksctl dump output and GNOME Disks display.

tbzatek commented 4 years ago

Are there any messages in dmesg related to the drive once it's been put into sleep? UDisks issues ATA IDENTIFY DEVICE ECh upon startup and that might fail when the disk is offline. Can you try putting your disk to sleep using hdparm -y instead?

Please also check your system log for any kind of such messages:

(udisksd:10168): udisks-CRITICAL **: 18:42:56.425: Error probing device: Error sending ATA command IDENTIFY DEVICE to '/dev/sdi': Unexpected sense data returned:
0000: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................
0010: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00    ................
 (g-io-error-quark, 0)

The above situation caused by sleeping disk also means there's no org.freedesktop.UDisks2.Drive.Ata interface on the drive object - you should notice that in udisksctl dump. Please check.

This might be caused by 5c9fc423fe2857107a1845b7048acd5f0c3e0433 perhaps but it doesn't explain the regression fully.

vedgy commented 4 years ago

There are no error messages from udisks in journalctl.

Messages unique to the disk that is put to sleep

  1. These messages appear on resume from hibernation, but before the issuing of the hdparm sleep command:

    kernel: ata4: softreset failed (device not ready)
    kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    kernel: ata4.00: configured for UDMA/133
  2. These messages appear right after udisksd starts (several seconds after the issuing of the hdparm sleep command):

    kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    kernel: ata4.00: configured for UDMA/133
  3. These messages appear when the system is suspended (soon after systemd-sleep[\<number>]: Suspending system...

    kernel: ata4: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
    kernel: ata4.00: configured for UDMA/133
    kernel: ata4.00: retrying FLUSH 0xea Emask 0x0
    kernel: sd 3:0:0:0: [sdc] Stopping disk

The org.freedesktop.UDisks2.Drive.Ata interface is present. I extracted the SmartEnabled=false; SmartSupported=false from its section in udisksctl dump. The complete section for this disk:

  org.freedesktop.UDisks2.Drive.Ata:
    AamEnabled:                                 false
    AamSupported:                               false
    AamVendorRecommendedValue:                  0
    ApmEnabled:                                 false
    ApmSupported:                               false
    PmEnabled:                                  false
    PmSupported:                                false
    ReadLookaheadEnabled:                       false
    ReadLookaheadSupported:                     false
    SecurityEnhancedEraseUnitMinutes:           0
    SecurityEraseUnitMinutes:                   0
    SecurityFrozen:                             false
    SmartEnabled:                               false
    SmartFailing:                               false
    SmartNumAttributesFailedInThePast:          -1
    SmartNumAttributesFailing:                  -1
    SmartNumBadSectors:                         1
    SmartPowerOnSeconds:                        0
    SmartSelftestPercentRemaining:              -1
    SmartSelftestStatus:
    SmartSupported:                             false
    SmartTemperature:                           0.0
    SmartUpdated:                               0
    WriteCacheEnabled:                          false
    WriteCacheSupported:                        false

I modified my hdparm sleep service: replaced -Y with -y. And the issue is gone! gnome-disk-utility works correctly and udisksctl dump reports SmartEnabled=true; SmartSupported=true just as when the disk is active. With -y the kernel: ata4... messages also do not appear in the journal after udisksd starts. That is, only -Y triggers this udisks bug.

5c9fc423fe2857107a1845b7048acd5f0c3e0433 seems to be present in udisks 2.8.4 too, so it couldn't have caused the regression in 2.9.0 by itself.

tbzatek commented 4 years ago

Thanks for checking. You should not be seeing any of the kernel libata messages, ATA disk power states are part of regular use and transiting between power states should not generate any messages. What hdparm -Y does is putting the disk essentially in an offline mode, requiring (SATA) link reset to wake up. In such state the drive may refuse to process some ATA commands such as those UDisk is sending.

We would have to debug this is detail to find out what commands have failed and what can be done with it. For example UDisks could behave better when initial probing fails and perhaps try to reprobe the disk after some period or on another uevent. There's more: some interfaces do not show up when initial probing fails partially and never show up again within the device lifecycle. I did some testing using SATA -> USB converter that exhibits some more issues.

Leaving this ticket open for the time being, we have a reproducer.

tbzatek commented 1 month ago

There's been a number of changes in how the SMART data are retrieved and reported. Can you retest with the 2.10.90 release?