openzfsonwindows / openzfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
478 stars 18 forks source link

Pool creation on physical drive fails with unclear message if drive sensors are being accessed #325

Open lanato opened 1 year ago

lanato commented 1 year ago

System information

Type Version/Name
Distribution Name Windows
Distribution Version 10
Kernel Version Build 19045
Architecture x64
OpenZFS Version zfswin-2.2.0rc6 / zfs-kmod-zfswin-2.2.0rc6

Describe the problem you're observing

Trying to create a ZFS storage pool on an physical hard drive fails on 2.2.0rc6. It's likely the same issue as initially observed in now-closed #294 on 2.2.0rc5. Creating a file-based pool works fine.

Describe how to reproduce the problem

The operation fails with the same message regardless of parameters tried (mirror/no mirror, ashift/no ashift).

> zpool.exe create tank \\.\PHYSICALDRIVE0  
working on dev '#1048576#10000820862976#\\.\PHYSICALDRIVE0'
correcting path: '//./#1048576#10000820862976#//./PHYSICALDRIVE0'
cannot create 'tank': invalid argument for this pool operation

Interestingly, the error message changes if the hard drive is offlined in disk management before attempting the operation:

> zpool.exe create tank \\.\PHYSICALDRIVE0
cannot label 'PHYSICALDRIVE0': try using parted(8) and then provide a specific slice: -2

I suspect this operation is not supported by Windows.

Include any warning/errors/backtraces from the system logs

No relevant system logs identified.

EchterAgo commented 1 year ago

Did you try cleaning your partition table?

See https://github.com/openzfsonwindows/openzfs/issues/294#issuecomment-1765105324

lanato commented 1 year ago

Did you try cleaning your partition table?

See #294 (comment)

Hi! Yeah, that didn't have any impact. However, watching disk management during the process shows that zpool.exe does succeed in part of the process, namely partitioning the disk.

C:\WINDOWS\system32>diskpart

Microsoft DiskPart version 10.0.19041.3570

Copyright (C) Microsoft Corporation.
On computer: ARDOS

DISKPART> select disk 0

Disk 0 is now the selected disk.

DISKPART> clean

DiskPart succeeded in cleaning the disk.

DISKPART> exit

Leaving DiskPart...

post cleaning

C:\WINDOWS\system32>zpool.exe create tank \\.\PHYSICALDRIVE0
working on dev '#1048576#10000820862976#\\.\PHYSICALDRIVE0'
correcting path: '//./#1048576#10000820862976#//./PHYSICALDRIVE0'
cannot create 'tank': invalid argument for this pool operation

post zpool

EchterAgo commented 1 year ago

Yea, that does look the same as on my drives with the 8MB partition at the end. You could try to dump the debug print buffer: https://openzfsonosx.org/wiki/Windows_BSOD

@derritter88 did you find what the cause of #294 was?

derritter88 commented 1 year ago

My „problem“ was that it seems I didn’t use zpool command properly

derritter88 commented 1 year ago

In general I only use zpool create PHYSICALDRIVEx

EchterAgo commented 1 year ago

In general I only use zpool create PHYSICALDRIVEx

Oh, I do that too, never thought that would be the reason. We should open an issue for that.

lanato commented 1 year ago

I've tried the following aliases without significant differences in output, they all result in an "invalid argument for pool operation" error:

\\.\PHYSICALDRIVE0
PHYSICALDRIVE0
\\?\PHYSICALDRIVE0

Yea, that does look the same as on my drives with the 8MB partition at the end. You could try to dump the debug print buffer: https://openzfsonosx.org/wiki/Windows_BSOD

I'll try to wrap my head around WinDbg and get you a dump.

derritter88 commented 1 year ago

@lanato I scrolled through my Powershell history to find the cmd which I have used two days ago: zpool create -o ashift=12 Serien PHYSICALDRIVE4

This worked for me.

lundman commented 1 year ago

And you are using a "run as Administrator" shell/cmd/powershell ?

derritter88 commented 1 year ago

Yes I always do that.

lanato commented 1 year ago

And you are using a "run as Administrator" shell/cmd/powershell ?

Of course

Yea, that does look the same as on my drives with the 8MB partition at the end. You could try to dump the debug print buffer: https://openzfsonosx.org/wiki/Windows_BSOD

I've done some more testing to generate some more data that might be of interest, using a USB-to-SATA cable (USB Attached SCSI) and Virtualbox on the same computer, with the official Windows 11 evaluation image (Build 22621.2428). Process as follows:

  1. Start the system and clean the disk with diskpart
  2. Run "zpool create tank PHYSICALDRIVEn"
  3. Finish if successful, otherwise force a crash to generate a dump
Environment Connection Success Logs
Native (W10) Direct SATA No cbuf.internal.txt
Native (W10) SATA over USB No cbuf.usb.txt
Virtualbox (W11) Direct SATA Yes
Virtualbox (W11) SATA over USB Yes

I am able to import the pools created in Virtualbox in the native environment afterwards.

lanato commented 1 year ago

Fiddling with Virtualbox for a while to get the physical disk working gave me an idea for something more to test, which turns out to be the solution: I have a system monitor running in the background that observes, among other things, system temperatures. As it turns out, the drive has a temperature sensor that this software automatically starts to observe. And this ends up keeping the drive "busy" in a way that prevents just the relabeling process.

Killing the monitoring service allows the pool to be created in the native environment as expected. So this might not really be an issue with openzfs itself, but rather a lack of communication of errors from behind the curtain.

lundman commented 1 year ago

Killing the monitoring service allows the pool to be created in the native environment as expected

OK that is interesting, most likely the monitoring software has it open in sharing mode of some kind, and we just try to open the disk in exclusive mode, and fail. I'll see if I can find some equivalent software to reproduce it and see if we can co-exist.

lanato commented 1 year ago

Killing the monitoring service allows the pool to be created in the native environment as expected

OK that is interesting, most likely the monitoring software has it open in sharing mode of some kind, and we just try to open the disk in exclusive mode, and fail. I'll see if I can find some equivalent software to reproduce it and see if we can co-exist.

If you'd like to investigate in more detail, the software in question is Fan Control. It is closed source, but appears to use the open-source project LibreHardwareMonitor for interfacing with sensors.

lundman commented 1 year ago

Aha ok cannot create 'BOOM': invalid argument for this pool operation

lundman commented 1 year ago
    ntstatus = ZwCreateFile(&dvd->vd_lh,
        spa_mode(spa) == SPA_MODE_READ ? GENERIC_READ | SYNCHRONIZE :
        GENERIC_READ | GENERIC_WRITE | SYNCHRONIZE,
        &ObjectAttributes,
        &iostatus,
        0,
        FILE_ATTRIBUTE_NORMAL,
        /* FILE_SHARE_WRITE | */ FILE_SHARE_READ,
        FILE_OPEN,
        FILE_SYNCHRONOUS_IO_NONALERT |
        (spa_mode(spa) == SPA_MODE_READ ? 0 :
        FILE_NO_INTERMEDIATE_BUFFERING),
        NULL,
        0);

returns ntstatus == 0xc000003a.

STATUS_OBJECT_PATH_NOT_FOUND ? ok hmm that's confusing

"\??\#1048576#21464350720#\\?\PHYSICALDRIVE1"

ah ok so not kernel, we do infact send the wrong name

lundman commented 1 year ago

OK so normally we pass just FILE_SHARE_READ as we are willing to share READ access to others, for this disk. Your FanControl has opened the disk with FILE_SHARE_READ | FILE_SHARE_WRITE, which is odd. We must then open also with FILE_SHARE_READ | FILE_SHARE_WRITE.

But we are to take the whole disk for ZFS, so we don't want to share WRITE access with other things.

Why is FanControl wanting to WRITE to the disk it is monitoring?

lundman commented 1 year ago

https://github.com/LibreHardwareMonitor/LibreHardwareMonitor/blob/master/LibreHardwareMonitorLib/Hardware/Storage/WindowsSmart.cs#L23

Could be they are forced to, to be allowed to send queries to it? or was it just muscle memory.

We can certainly add FILE_SHARE_WRITE - just seems risky.

sskras commented 11 months ago

Maybe the shorter way would be to ask about that on their discussion board? https://github.com/LibreHardwareMonitor/LibreHardwareMonitor/discussions

lanato commented 11 months ago

Having experimented a bit more now, here are two thoughts I'd like to offer:

  1. Using (import, mount, read/write) the ZFS pool after creation appears to work despite the monitoring software running
  2. There likely are a lot of other tools out there potentially causing similar conflicts

Given that pool creation (ideally) is a one time action, I think it's reasonable to ask the user to temporarily disable interfering software instead of attempting to change the behavior of other projects. However, it would be helpful if the zpool create command could notice and point out such access conflicts.