storaged-project / udisks

The UDisks project provides a daemon, tools and libraries to access and manipulate disks, storage devices and technologies.
https://storaged.org/doc/udisks2-api/latest/
Other
343 stars 143 forks source link

Error probing device: Error sending ATA command IDENTIFY PACKET DEVICE to '/dev/sr0 #732

Closed cmurf closed 1 week ago

cmurf commented 4 years ago

udisks2-2.8.4-4.fc32.x86_64 systemd-udev-245~rc1-2.fc32.x86_64 both: kernel-5.6.0-0.rc2.git0.1.fc32.x86_64 kernel-5.5.3-200.fc31.x86_64

[ 6.313699] vm.local udisksd[768]: Error probing device: Error sending ATA command IDENTIFY PACKET DEVICE to '/dev/sr0': Unexpected sense data returned: 0000: 70 00 05 00 00 00 00 0a 00 58 00 01 21 04 00 00 p........X..!... 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ (g-io-error-quark, 0)

This is in a qemu-kvm guest.

tbzatek commented 4 years ago

Yes, this is known, it's a QEMU issue in the way the CD-ROM device emulation is done. We can't do much here, other than making the warning silent.

paulmenzel commented 4 years ago

Probably unrelated, but for the record, there is a regression in Linux 5.6-rc1 still not fixed in Linus’ master (5.6-rc3). The patch fixes this.

bsdice commented 2 months ago

Yo, hit this today upgrading to 6.6.44 kernel, running udisks2 2.10.1-4 on Arch.

udisksd[5518]: Error probing device: Error sending ATA command IDENTIFY PACKET DEVICE to '/dev/sr0': Unexpected sense data returned: 0000: f0 00 01 00 50 00 02 0a 00 00 00 02 00 1d 00 00 ....P........... 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ (g-io-error-quark, 0)

Possibly related to this, unverified: https://lore.kernel.org/stable/20240730151654.600327197@linuxfoundation.org/

sushi2503 commented 2 months ago

Same error here since kernel upgrade to 6.10.2.arch1-2 on Arch.

paulmenzel commented 2 months ago

@bsdice, @sushi2503, sounds like a Linux kernel regression. Please bisect.

sushi2503 commented 2 months ago

@paulmenzel : For me to. Sorry I haven't the knowledge to bisect, hope bsdice has it...

bsdice commented 2 months ago

I compiled 6.6.44 with the following three patches reverted (old dog making an educated guess) and that appears to have fixed the error:

ata-libata-scsi-do-not-overwrite-valid-sense-data-when-ck_cond-1.txt ata-libata-scsi-fix-offsets-for-the-fixed-format-sense-data.txt ata-libata-scsi-honor-the-d_sense-bit-for-ck_cond-1-and-no-error.txt

@paulmenzel Enough info to work with? I am not sure if there are interdependencies with the patches so I reverted all three.

paulmenzel commented 2 months ago

Nice. But as you can see this is the UDisks’ project. It’d be great if you could report this to the commit authors and maintainers.

bsdice commented 2 months ago

@paulmenzel Thank you so much for reply, I will just cc the author.

Can you say if this is UDisks mis-interpreting kernel's changed sense data, or if kernel is wrongly sending malformed sense data? Because the patch comments claim to have rectified previously wrong data: "Correct the ATA PASS-THROUGH fixed format sense data offsets to conform to SPC-6 and SAT-5 specifications."

paulmenzel commented 2 months ago

Can you say if this is UDisks mis-interpreting kernel's changed sense data, or if kernel is wrongly sending malformed sense data?

Sorry I haven’t looked closely, and do not know. (I am not a UDisk person.) I just know, that it wouldn’t matter as the Linux kernel has a no regression policy.

[…]

istenburg commented 2 months ago

Arch linux Kernel 6.10.3-arch1-2 udisks2 2.10.1-4, also receive this message Aug 09 10:26:46 I-NET udisksd[1038]: Error probing device: Error sending ATA command IDENTIFY DEVICE to '/dev/sda': Unexpected sense data returned: 0000: f0 00 01 00 50 00 01 0a 00 00 00 00 00 1d 00 00 ....P........... 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ (g-io-error-quark, 0) Aug 09 10:26:46 I-NET udisksd[1038]: Error probing device: Error sending ATA command IDENTIFY DEVICE to '/dev/sdb': Unexpected sense data returned: 0000: f0 00 01 00 50 00 00 0a 00 00 00 00 00 1d 00 00 ....P........... 0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ (g-io-error-quark, 0) Is this a hardware error?

bsdice commented 2 months ago

Most likely not a hardware error, but a kernel or UDisks regression. I reverted three libata patches and bug went away. I wrote to patch author and ata subsystem maintainer at kernel.org but no response so far. Might be due to vacation time on northern hemisphere. Possibility 1 is that the kernel patches are wrong, 2 that kernel now emits correct sense data which were previously slightly wrong or format garbled, but UDisks cannot process it now correctly. Don't have time to follow this rabbit hole down, maintainers have to do it.

sushi2503 commented 2 months ago

@bsdice : Maybe it would be better to open a kernel bug report ?

bsdice commented 2 months ago

I don't want to do it myself. From what I have seen you sub to -stable and reply to GregKH for the specific patch mail that there is a regression. Patches will be dropped if plausible. This puts pressure on patch writer or subsystem maintainers to clarify or fix it.

istenburg commented 2 months ago

So you can probably rule out a kernel bug. I tested the lts-kernel, same problem...

sushi2503 commented 2 months ago

Most importantly is that this get fixed.

paulmenzel commented 2 months ago

So you can probably rule out a kernel bug.

Why? @bsdice wrote that reverting some patches fixed the issue for them.

I tested the lts-kernel, same problem...

Patches are backported to LTS series too. What Linux kernel do you mean exactly? The 6.10.3 from your earlier comment?

bsdice commented 2 months ago

@istenburg No you can't rule that out, try a kernel from 3 months ago like 6.6.40. The stable kernel branches all got backported the three patches. That is why you see it now on Arch which is rolling linux package with latest kernel + patches and linux-lts package which is on 6.6 branch + even more stable patches.

bsdice commented 2 months ago

https://lore.kernel.org/stable/df43ed14-9762-4193-990a-daec1a320288@heusel.eu/T/#mc60ff08e7c20d45faf9588e42048b5f01acaa0a3

Also broken: hddtemp, hdparm, probably more. This will be fun to watch, because the kernel is now acting properly, but applications are now misbehaving. UDisks will need a code update...

Edit: After thinking about it for a bit, best way forward probably is to revert the kernel changes, notify software maintainers of code using that API and give them 3 or 6 or 12 months to fix the use of the API. Then re-introduce the kernel change. hdparm will probably be fixed first, so people just have to copy or emulate the changes.

sushi2503 commented 2 months ago

Yep, I have the same error with my external backup /dev/sda disk but began with my sr0.

Fun, as you say. Some codes are made with the elbows!

Let see the machinery...

christian-heusel commented 2 months ago

After @bsdice has commented on the Arch Linux Forum thread I also found this issue and forwarded it to the LKML thread aswell 😊

This is the message: https://lore.kernel.org/all/e206181e-d9d7-421b-af14-2a70a7f83006@heusel.eu

tbzatek commented 2 months ago

Sorry I haven’t looked closely, and do not know. (I am not a UDisk person.) I just know, that it wouldn’t matter as the Linux kernel has a no regression policy.

I wouldn't necessarily play on the no-regression-policy note in this case, it's not the same scenario like breaking syscall API. I.e. the following statement from upstream

TL;DR: it is very hard to say that we have introduced a regression, because this crap has basically been broken in one way or another since it was introduced...

may be actually correct.

christian-heusel commented 2 months ago

Well yes and no, especially since it breaks a ton of other tools aswell 😅

tbzatek commented 2 months ago

This is in a qemu-kvm guest.

qemu is known to have incomplete or incompatible implementation of ATA emulation, both for disk drives and optical drives.

The issue discussed recently is a separate one. Suggesting to continue in #1305 (not sure if github allows to split existing discussions).

tbzatek commented 2 months ago

Well yes and no, especially since it breaks a ton of other tools aswell 😅

Any reasonable CI would catch this. I agree that this kind of change is better to coordinate with userspace prior to pushing to stable kernel branches, but it happened in the past a couple of times.

UDisks needs to be fixed either way, this particular code has been sitting there for over a decade with no recent developer having actual hands-on experience with ATA commands.

bsdice commented 2 months ago

Niklas came up with a patch to bridge kernel correctness and a little ignorant user space software: https://lore.kernel.org/linux-ide/20240812151517.1162241-2-cassel@kernel.org/

Nine days from report to tentative fix, underappreciated agility.

yan12125 commented 1 month ago

From the commit message in https://lore.kernel.org/linux-ide/20240812151517.1162241-2-cassel@kernel.org/, it seems something in udisks should be changed?

A lot of user space programs incorrectly assume that the sense data is in descriptor format, without checking the RESPONSE CODE field of the returned sense data (to see which format the sense data is in).

paulmenzel commented 1 month ago

Sorry I haven’t looked closely, and do not know. (I am not a UDisk person.) I just know, that it wouldn’t matter as the Linux kernel has a no regression policy.

I wouldn't necessarily play on the no-regression-policy note in this case, it's not the same scenario like breaking syscall API. I.e. the following statement from upstream

TL;DR: it is very hard to say that we have introduced a regression, because this crap has basically been broken in one way or another since it was introduced...

may be actually correct.

That does not matter. Linux never breaks userspace, so people can be sure they can always update the Linux kernel without problems.

sushi2503 commented 1 month ago

6.10.4, 6.10.5, still the same.

christian-heusel commented 1 month ago

Yes, the reverts have just been added to the stable series :) https://git.kernel.org/pub/scm/linux/kernel/git/stable/stable-queue.git/tree/queue-6.10/revert-ata-libata-scsi-honor-the-d_sense-bit-for-ck_cond-1-and-no-error.patch

sushi2503 commented 1 month ago

Nice, soon fixed.

sushi2503 commented 1 month ago

Since kernel 6.10.6 I don't see the errors anymore, sounds like fixed.

tri-llionaire commented 1 month ago

6.10.6 works for me, on 6.10.7 it's broken again. I had to boot back into 6.10.6 to mount the drive.

sushi2503 commented 1 month ago

Here on 6.10.7 and 6.10.8 it's working fine.

tbzatek commented 1 month ago

So I have the affected kernel 6.10.4 here and can reproduce the problem. As a source of inspiration I took a look at udev's id_ata and to my surprise found out the code was originally written by the same author. Only that udev has been maintained properly over the years, while nobody touched udisksata.c since 2012.

Important fixes from udev are https://github.com/systemd/systemd/pull/13654 and https://github.com/systemd/systemd/pull/24923.

Anecdotally this concern more or less applies to udisks as well: https://github.com/systemd/systemd/issues/2362#issuecomment-178079214 and https://github.com/systemd/systemd/pull/2363#issuecomment-173274244

Posted #1311 with a quick fix, please test. We don't have variety of hardware to test this on.

tbzatek commented 1 month ago

Experimenting with my SATA drives and qemu ATA device, it turned out a SG v3 code path is taken. This udev change has effectively taken sense code checking out of the v3 path while udisks still maintains it and fails on qemu devices: https://github.com/systemd/systemd/commit/66026fca95c892059b7dc4f33596f63741519eb8

I don't think that change is right.