openzfsonwindows / openzfs

OpenZFS on Linux and FreeBSD
https://openzfs.github.io/openzfs-docs
Other
455 stars 16 forks source link

Under low RAM `zpool create` might end up half mounted #380

Open sskras opened 4 months ago

sskras commented 4 months ago

System information

Type Version/Name
Distribution Name Microsoft Windows
Distribution Version 10 (21H2)
Kernel Version 10.0.19044.3086
Architecture amd64
OpenZFS Version 2.2.3rc4

Describe the problem you're observing

I created a virtual physical drive using OSFMount. If I use it to create a new ZFS pool, I fail to either export or destroy it after that.

Describe how to reproduce the problem

The steps:

Include any warning/errors/backtraces from the system logs

I use preinstalled MSYS2 and gsudo for my operations: \ (I also trim the first line of MSYS2 prompt except for the first occurrence for better signal-to-noise ratio) ```shell saukrs@DESKTOP-O7JE7JE MSYS ~ $ PATH=$PATH:/D/Program\ Files/OpenZFS\ On\ Windows $ PATH=$PATH:/D/Program\ Files/OSFMount $ cd /D/Downloads $ dd if=/dev/zero of=./openzfs-sample-pool.img bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB, 100 MiB) copied, 0.0569853 s, 1.8 GB/s $ sudo OSFMount.com -a -t file -f openzfs-sample-pool.img -o rw,physical Creating device... OK Setting disk attributes... Done. $ sudo OSFMount.com -l [logical disks] [physical disks] \\.\PhysicalDrive2 $ sudo zpool import path '\\?\scsi#disk&ven_samsung&prod_ssd_870_evo_1tb#4&f78d1d6&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}' and '\\?\PhysicalDrive0' read partitions ok 2 gpt 0: type 846cd8d830 off 0x4400 len 0xffbc00 gpt 1: type 846cd8d830 off 0x1000000 len 0xe8dfc00000 asking libefi to read label EFI read OK, max partitions 128 part 0: offset 22: len 7fde: tag: 10 name: 'Microsoft reserved partition' part 1: offset 8000: len 746fe000: tag: 11 name: 'Basic data partition' path '\\?\scsi#disk&ven_nvme&prod_samsung_ssd_980#5&2eacfa49&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}' and '\\?\PhysicalDrive1' read partitions ok 5 gpt 0: type 846cd8d830 off 0x100000 len 0x6400000 gpt 1: type 846cd8d830 off 0x6500000 len 0x1000000 gpt 2: type 846cd8d830 off 0x7500000 len 0x3b7760a600 gpt 3: type 846cd8d830 off 0x3b7ec00000 len 0x1f900000 gpt 4: type 846cd8d830 off 0x3b9e500000 len 0x38d2600000 asking libefi to read label EFI read OK, max partitions 128 part 0: offset 800: len 32000: tag: c name: 'EFI system partition' part 1: offset 32800: len 8000: tag: 10 name: 'Microsoft reserved partition' part 2: offset 3a800: len 1dbbb053: tag: 11 name: 'Basic data partition' part 4: offset 1dcf2800: len 1c693000: tag: 11 name: 'Basic data partition' path '\\?\scsi#disk&ven_passmark&prod_osfdisk#1&2afd7d61&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}' and '\\?\PhysicalDrive2' read partitions ok 0 asking libefi to read label no pools available to import $ sudo zpool create sample-pool PhysicalDrive2 Expanded path to '\\?\PhysicalDrive2' working on dev '#1048576#94371840#\\?\PhysicalDrive2' setting path here '/dev/physicaldrive2' setting physpath here '#1048576#94371840#\\?\PhysicalDrive2' $ sudo zpool status pool: sample-pool state: ONLINE config: NAME STATE READ WRITE CKSUM sample-pool ONLINE 0 0 0 physicaldrive2 ONLINE 0 0 0 errors: No known data errors $ sudo zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT sample-pool 80M 108K 79.9M - - 4% 0% 1.00x ONLINE - $ sudo zpool export sample-pool cannot export 'sample-pool': pool is busy $ sudo zfs list NAME USED AVAIL REFER MOUNTPOINT sample-pool 112K 39.9M 24.5K /sample-pool $ sudo zfs mount $ sudo zpool destroy sample-pool cannot destroy 'sample-pool': pool is busy $ sudo zpool destroy -f sample-pool cannot destroy 'sample-pool': pool is busy $ sudo zfs version zfswin-2.2.3rc4 zfs-kmod-zfswin-2.2.3rc4 $ pwd /D/Downloads $ cd /D/Program\ Files/OSFMount $ powershell '(Get-Item OSFMount.exe).VersionInfo' ProductVersion FileVersion FileName -------------- ----------- -------- 3.1.1003.0 3.1.1003.0 D:\Program Files\OSFMount\OSFMount.exe ``` OSFMount is v3.1 (1003): ![image](https://github.com/openzfsonwindows/openzfs/assets/7887758/abe0a8f0-c566-4513-a97d-4b2f5d465859)
lundman commented 4 months ago

Quite curious. I usually create .vhd file to devices, so no reason it shouldn't work with OSFMount. I'll give it a try and see where the error is returned from.

lundman commented 4 months ago

Tested to work just fine. It is more likely you have something open in the pool, for example if you "cd" to it. Unless it is the fact that you didnt get a driveletter for the pool, but mounted it in C: ?

sskras commented 4 months ago

Thanks a lot for testing. I retried it, and the issues remains on my laptop.

As you see, zfs mount returns zero characters to me:

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zpool status
  pool: sample-pool
 state: ONLINE
config:

        NAME              STATE     READ WRITE CKSUM
        sample-pool       ONLINE       0     0     0
          physicaldrive2  ONLINE       0     0     0

errors: No known data errors

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zpool list
NAME          SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
sample-pool    80M   105K  79.9M        -         -     3%     0%  1.00x    ONLINE  -

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zfs mount

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zfs mount | wc
      0       0       0

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zpool export sample-pool
cannot export 'sample-pool': pool is busy

Unless it is the fact that you didnt get a driveletter for the pool, but mounted it in C: ?

Well, it's not here either. At least I don't see it:

$ ll /C
total 8991898
drwxr-xr-x 1 saukrs None          0 Jan 23  2022 '$Recycle.Bin'
drwxr-xr-x 1 saukrs None          0 Apr 11 15:39 '$WinREAgent'
drwxr-xr-x 1 saukrs None          0 May 21 08:12  backups
-rw-r--r-- 1 saukrs None        186 Sep 19  2022  crisp-uninstall.bat
-rw-r--r-- 1 saukrs None       8192 May  3 13:30  DumpStack.log
-rw-r--r-- 1 saukrs None       8192 May 19 15:34  DumpStack.log.tmp
lrwxrwxrwx 1 saukrs None         31 Sep  4  2023  etc -> /c/Windows/System32/drivers/etc
-rw-r--r-- 1 saukrs None        481 Oct 10  2023  GNUmakefile
-rw-r--r-- 1 saukrs None 3184181248 May 19 15:34  hiberfil.sys
drwxr-xr-x 1 saukrs None          0 Jan 24 00:54  midipix
drwxr-xr-x 1 saukrs None          0 Nov 13  2023  MININT
drwxr-xr-x 1 saukrs None          0 May 17 01:01  msys64
drwxr-xr-x 1 saukrs None          0 Apr 29  2023  mt86plus
-rw-r--r-- 1 saukrs None 6006591488 May 21 03:47  pagefile.sys
drwxr-xr-x 1 saukrs None          0 Apr 29  2023  plop
drwxr-xr-x 1 saukrs None          0 May  3 14:45 'Program Files'
drwxr-xr-x 1 saukrs None          0 Jan 12 15:00 'Program Files (x86)'
drwxr-xr-x 1 saukrs None          0 May 18 15:08  ProgramData
drwxr-xr-x 1 saukrs None          0 Sep 15  2023  Recovery
drwxr-xr-x 1 saukrs None          0 Nov 16  2023  Restored
-rw-r--r-- 1 saukrs None   16777216 May 19 15:34  swapfile.sys
drwxr-xr-x 1 saukrs None          0 Apr 29  2023  SWSetup
-rw-r--r-- 1 saukrs None       1024 Nov 16  2023  SYSTAG.BIN
drwxr-xr-x 1 saukrs None          0 May 19 10:52 'System Volume Information'
drwxr-xr-x 1 saukrs None          0 May  7  2023  system.sav
drwxr-xr-x 1 saukrs None          0 Dec 23 20:49  Users
drwxr-xr-x 1 saukrs None          0 May  3 13:38  Windows

PS. Just moved some junk from C:/ dir to C:/backups so the output is clearer.

lundman commented 4 months ago

Hmm can you zfs set driveletter=Z sample-pool and unmount/mount or export/import ? It should have given you a driveletter by default so it is confusing it didn't.

sskras commented 4 months ago

OK, I did, but it doesn't seem to be getting a driveletter:

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zfs set driveletter=Z sample-pool

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zfs mount

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ fsutil fsinfo drives

Drives: B:\ C:\ D:\

Now if try unmount + mount, the latter fails (while the former does nothing, I guess):

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zfs unmount -a

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zfs mount -av
cannot mount 'sample-pool': Unknown error

Exporting fails with the same output:

saukrs@DESKTOP-O7JE7JE MINGW64 /D/Downloads
$ sudo zpool export -a
cannot export 'sample-pool': pool is busy

Can I enable some debug output without running a WinDBG or other complex stuff?

PS. I have Ext2Fsd and Winbtrfs software installed and running in parallel. Maybe it interferes somehow? Or maybe that's just a older version of NT kernel behaving differently?

lundman commented 4 months ago

Hmm can you reboot and import again? It has lost its mind there

sskras commented 4 months ago

I did today's run in a rebooted OS session. Sure, will do again (at least to free the image file:)

So no built-in debugging means to help?

Putting `... get all` outputs just in case...
* zpool: ```sh $ sudo zpool get all NAME PROPERTY VALUE SOURCE sample-pool size 80M - sample-pool capacity 0% - sample-pool altroot - default sample-pool health ONLINE - sample-pool guid 7331206367092621922 - sample-pool version - default sample-pool bootfs - default sample-pool delegation on default sample-pool autoreplace off default sample-pool cachefile - default sample-pool failmode wait default sample-pool listsnapshots off default sample-pool autoexpand off default sample-pool dedupratio 1.00x - sample-pool free 79.9M - sample-pool allocated 128K - sample-pool readonly off - sample-pool ashift 0 default sample-pool comment - default sample-pool expandsize - - sample-pool freeing 0 - sample-pool fragmentation 5% - sample-pool leaked 0 - sample-pool multihost off default sample-pool checkpoint - - sample-pool load_guid 3759889056826721622 - sample-pool autotrim off default sample-pool compatibility off default sample-pool bcloneused 0 - sample-pool bclonesaved 0 - sample-pool bcloneratio 1.00x - sample-pool feature@async_destroy enabled local sample-pool feature@empty_bpobj enabled local sample-pool feature@lz4_compress active local sample-pool feature@multi_vdev_crash_dump enabled local sample-pool feature@spacemap_histogram active local sample-pool feature@enabled_txg active local sample-pool feature@hole_birth active local sample-pool feature@extensible_dataset active local sample-pool feature@embedded_data active local sample-pool feature@bookmarks enabled local sample-pool feature@filesystem_limits enabled local sample-pool feature@large_blocks enabled local sample-pool feature@large_dnode enabled local sample-pool feature@sha512 enabled local sample-pool feature@skein enabled local sample-pool feature@edonr enabled local sample-pool feature@userobj_accounting active local sample-pool feature@encryption enabled local sample-pool feature@project_quota active local sample-pool feature@device_removal enabled local sample-pool feature@obsolete_counts enabled local sample-pool feature@zpool_checkpoint enabled local sample-pool feature@spacemap_v2 active local sample-pool feature@allocation_classes enabled local sample-pool feature@resilver_defer enabled local sample-pool feature@bookmark_v2 enabled local sample-pool feature@redaction_bookmarks enabled local sample-pool feature@redacted_datasets enabled local sample-pool feature@bookmark_written enabled local sample-pool feature@log_spacemap active local sample-pool feature@livelist enabled local sample-pool feature@device_rebuild enabled local sample-pool feature@zstd_compress enabled local sample-pool feature@draid enabled local sample-pool feature@zilsaxattr enabled local sample-pool feature@head_errlog active local sample-pool feature@blake3 enabled local sample-pool feature@block_cloning enabled local sample-pool feature@vdev_zaps_v2 active local sample-pool feature@redaction_list_spill enabled local sample-pool feature@raidz_expansion enabled local ``` * zfs: ```sh $ sudo zfs get all NAME PROPERTY VALUE SOURCE sample-pool type filesystem - sample-pool creation Tue May 21 08:03 2024 - sample-pool used 128K - sample-pool available 39.9M - sample-pool referenced 25.5K - sample-pool compressratio 1.00x - sample-pool mounted no - sample-pool quota none default sample-pool reservation none default sample-pool recordsize 128K default sample-pool mountpoint /sample-pool default sample-pool sharenfs off default sample-pool checksum on default sample-pool compression on default sample-pool atime on default sample-pool devices on default sample-pool exec on default sample-pool setuid on default sample-pool readonly off default sample-pool zoned off default sample-pool snapdir hidden default sample-pool aclmode discard default sample-pool aclinherit restricted default sample-pool createtxg 1 - sample-pool canmount on default sample-pool xattr on default sample-pool copies 1 default sample-pool version 5 - sample-pool utf8only off - sample-pool normalization none - sample-pool casesensitivity sensitive - sample-pool vscan off default sample-pool nbmand off default sample-pool sharesmb off default sample-pool refquota none default sample-pool refreservation none default sample-pool guid 6295728974672549137 - sample-pool primarycache all default sample-pool secondarycache all default sample-pool usedbysnapshots 0B - sample-pool usedbydataset 25.5K - sample-pool usedbychildren 102K - sample-pool usedbyrefreservation 0B - sample-pool logbias latency default sample-pool objsetid 54 - sample-pool dedup off default sample-pool mlslabel none default sample-pool sync standard default sample-pool dnodesize legacy default sample-pool refcompressratio 1.00x - sample-pool written 25.5K - sample-pool logicalused 47K - sample-pool logicalreferenced 13K - sample-pool volmode default default sample-pool filesystem_limit none default sample-pool snapshot_limit none default sample-pool filesystem_count none default sample-pool snapshot_count none default sample-pool snapdev hidden default sample-pool acltype nfsv4 default sample-pool context none default sample-pool fscontext none default sample-pool defcontext none default sample-pool rootcontext none default sample-pool relatime on default sample-pool redundant_metadata all default sample-pool overlay on default sample-pool encryption off default sample-pool keylocation none default sample-pool keyformat none default sample-pool pbkdf2iters 0 default sample-pool special_small_blocks 0 default sample-pool prefetch all default sample-pool com.apple.mimic off default sample-pool driveletter Z local ```
lundman commented 4 months ago

There's a bunch of debugging, but they do require a fair bit of setup.

sskras commented 4 months ago

One more thought. @lundman, when it succeeds on your machine, does it have zero OS-native partitions too?

saukrs@DESKTOP-O7JE7JE MSYS /D/_BACKUPS/src
$ powershell 'Get-WmiObject -Class Win32_DiskDrive | select Name,Model,SerialNumber,InterfaceType,Partitions,Size | format-table'

Name               Model                             SerialNumber         InterfaceType Partitions          Size
----               -----                             ------------         ------------- ----------          ----
\\.\PHYSICALDRIVE0 Samsung SSD 870 EVO 1TB           S626NF0R201463X      IDE                    1 1000202273280
\\.\PHYSICALDRIVE1 Samsung SSD 980 500GB             0025_38D3_32A0_5462. SCSI                   4  500105249280
\\.\PHYSICALDRIVE2 PassMark osfdisk SCSI Disk Device                      SCSI                   0      98703360
lundman commented 4 months ago

Partitions are created the ZFS way, which is all upstream stuff. It will not work right until zfs mount says its mounted on a driveletter.

sskras commented 4 months ago

Fair, and I was worried maybe OpenZFS needs a native OS partition for the pool to work successfully on Windows. So I asked if in your successful test you also get zero Windows partitions like me.

sskras commented 4 months ago

PS. Forgot to respond to your statement:

It is more likely you have something open in the pool, for example if you "cd" to it.

Well, I have no idea how could I cd anywhere as long as it's not mounting...

lundman commented 4 months ago

Can you also show zfs mount after reboot import?

sskras commented 4 months ago

Yes, just rebooted and recreated the virtual disk:

saukrs@DESKTOP-O7JE7JE MSYS /D/Downloads
$ sudo OSFMount.com -a -t file -f openzfs-sample-pool.img -o rw,physical
Creating device...
OK
Setting disk attributes...
Done.

saukrs@DESKTOP-O7JE7JE MSYS /D/Downloads
$ sudo OSFMount.com -l
[logical disks]

[physical disks]
\\.\PhysicalDrive2

Because PhysicalDrive2 vanishes on reboot, it's fair that I got no pools imported on boot:

saukrs@DESKTOP-O7JE7JE MSYS /D/Downloads
$ sudo zpool status
no pools available

Now zpool import sees the disk, but recognizes no pool inside it:

saukrs@DESKTOP-O7JE7JE MSYS /D/Downloads
$ sudo zpool import
path '\\?\scsi#disk&ven_samsung&prod_ssd_870_evo_1tb#4&f78d1d6&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive0'
read partitions ok 2
    gpt 0: type c34b6fd500 off 0x4400 len 0xffbc00
    gpt 1: type c34b6fd500 off 0x1000000 len 0xe8dfc00000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 22:    len 7fde:    tag: 10    name: 'Microsoft reserved partition'
    part 1:  offset 8000:    len 746fe000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_nvme&prod_samsung_ssd_980#5&2eacfa49&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive1'
read partitions ok 5
    gpt 0: type c34b6fd500 off 0x100000 len 0x6400000
    gpt 1: type c34b6fd500 off 0x6500000 len 0x1000000
    gpt 2: type c34b6fd500 off 0x7500000 len 0x3b7760a600
    gpt 3: type c34b6fd500 off 0x3b7ec00000 len 0x1f900000
    gpt 4: type c34b6fd500 off 0x3b9e500000 len 0x38d2600000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 800:    len 32000:    tag: c    name: 'EFI system partition'
    part 1:  offset 32800:    len 8000:    tag: 10    name: 'Microsoft reserved partition'
    part 2:  offset 3a800:    len 1dbbb053:    tag: 11    name: 'Basic data partition'
    part 4:  offset 1dcf2800:    len 1c693000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_passmark&prod_osfdisk#1&2afd7d61&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive2'
read partitions ok 0
asking libefi to read label
EFI read OK, max partitions 128
no pools available to import

Should I repeat creation of sample-pool ?

lundman commented 4 months ago

That's not right, why wouldn't it find the pool to import? Did it even write anything to the file hmm

sskras commented 4 months ago

Did it even write anything to the file hmm

It did at least some bits. I zeroed out image file and calculated the original checksum:

saukrs@DESKTOP-O7JE7JE MSYS /D/Downloads
$ sudo OSFMount.com -d -m 2

$ sudo OSFMount.com -l
[logical disks]

[physical disks]

$ dd if=/dev/zero of=./openzfs-sample-pool.img bs=1M count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB, 100 MiB) copied, 0.0432512 s, 2.4 GB/s

$ sha1sum ./openzfs-sample-pool.img | tee openzfs-sample-pool.sha1
2c2ceccb5ec5574f791d45b63c940cff20550f9a *./openzfs-sample-pool.img

And recreated the PhysicalDrive2:

$ sudo OSFMount.com -a -t file -f openzfs-sample-pool.img -o rw,physical
Creating device...
OK
Setting disk attributes...
Done.

$ sudo OSFMount.com -l
[logical disks]

[physical disks]
\\.\PhysicalDrive2

No changes after import scanning:

$ sudo zpool import
path '\\?\scsi#disk&ven_samsung&prod_ssd_870_evo_1tb#4&f78d1d6&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive0'
read partitions ok 2
    gpt 0: type 9f594fd600 off 0x4400 len 0xffbc00
    gpt 1: type 9f594fd600 off 0x1000000 len 0xe8dfc00000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 22:    len 7fde:    tag: 10    name: 'Microsoft reserved partition'
    part 1:  offset 8000:    len 746fe000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_nvme&prod_samsung_ssd_980#5&2eacfa49&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive1'
read partitions ok 5
    gpt 0: type 9f594fd600 off 0x100000 len 0x6400000
    gpt 1: type 9f594fd600 off 0x6500000 len 0x1000000
    gpt 2: type 9f594fd600 off 0x7500000 len 0x3b7760a600
    gpt 3: type 9f594fd600 off 0x3b7ec00000 len 0x1f900000
    gpt 4: type 9f594fd600 off 0x3b9e500000 len 0x38d2600000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 800:    len 32000:    tag: c    name: 'EFI system partition'
    part 1:  offset 32800:    len 8000:    tag: 10    name: 'Microsoft reserved partition'
    part 2:  offset 3a800:    len 1dbbb053:    tag: 11    name: 'Basic data partition'
    part 4:  offset 1dcf2800:    len 1c693000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_passmark&prod_osfdisk#1&2afd7d61&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive2'
read partitions ok 0
asking libefi to read label
no pools available to import

$ sha1sum -c openzfs-sample-pool.sha1
./openzfs-sample-pool.img: OK

So I recreated the testing pool:

$ sudo zpool create sample-pool PhysicalDrive2
Expanded path to '\\?\PhysicalDrive2'
working on dev '#1048576#94371840#\\?\PhysicalDrive2'
setting path here '/dev/physicaldrive2'
setting physpath here '#1048576#94371840#\\?\PhysicalDrive2'

$ sudo zpool status -v
  pool: sample-pool
 state: ONLINE
config:

        NAME              STATE     READ WRITE CKSUM
        sample-pool       ONLINE       0     0     0
          physicaldrive2  ONLINE       0     0     0

errors: No known data errors

Now the image file changed:

$ sha1sum -c openzfs-sample-pool.sha1
./openzfs-sample-pool.img: FAILED
sha1sum: WARNING: 1 computed checksum did NOT match

$ file openzfs-sample-pool.img
openzfs-sample-pool.img: DOS/MBR boot sector; partition 1 : ID=0xee, start-CHS (0x3ff,255,63), end-CHS (0x3ff,255,63), startsector 1, 204799 sectors, extended partition table (last)

$ fdisk -l openzfs-sample-pool.img
Disk openzfs-sample-pool.img: 100 MiB, 104857600 bytes, 204800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: D08A4243-5817-EF11-BDD4-5FEB5555ACD2

Device                    Start    End Sectors Size Type
openzfs-sample-pool.img1   2048 186367  184320  90M Solaris /usr & Apple ZFS
openzfs-sample-pool.img9 186368 202751   16384   8M Solaris reserved 1

Looks like some partition table was written into it.

And surely sample-pool filesystem also was created:

$ sudo zfs list
NAME          USED  AVAIL  REFER  MOUNTPOINT
sample-pool   107K  39.9M  24.5K  /sample-pool

Some properties:

$ sudo zfs get all | grep -i -e mount -e drive
sample-pool  mounted               no                     -
sample-pool  mountpoint            /sample-pool           default
sample-pool  canmount              on                     default
sample-pool  driveletter           -                      default

driveletter is not set, which would be fine for my test case.

Now if I try to export the pool, it fails:

saukrs@DESKTOP-O7JE7JE MSYS /D/Downloads
$ sudo zpool export sample-pool
cannot export 'sample-pool': pool is busy

saukrs@DESKTOP-O7JE7JE MSYS /D/Downloads
$ sudo zpool export -af
cannot export 'sample-pool': pool is busy

Just as originally reported.

lundman commented 4 months ago

Yeah you can't export it, because the mounting failed, it is half mounted, so it doesnt know to unmount.

sskras commented 4 months ago

If the exact same commands (including OSFMount.com and the image size) works on your machine, I would like to know your Windows version. Especially if it's w10.

sskras commented 4 months ago

PS. I tried the same on Linux Debian 12 using losetup. It works as expected, and /sample-pool gets automounted. Surely the image file also receives some changes:

root@omn:~# losetup -l
NAME       SIZELIMIT OFFSET AUTOCLEAR RO BACK-FILE                     DIO LOG-SEC
/dev/loop0         0      0         0  0 /root/openzfs-sample-pool.img   0     512

root@omn:~# sha1sum -c openzfs-sample-pool.sha1
./openzfs-sample-pool.img: FAILED
sha1sum: WARNING: 1 computed checksum did NOT match

But no partition table is written into it:

root@omn:~# fdisk -l /dev/loop0
Disk /dev/loop0: 100 MiB, 104857600 bytes, 204800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

root@omn:~# gdisk -l /dev/loop0
GPT fdisk (gdisk) version 1.0.9

Partition table scan:
  MBR: not present
  BSD: not present
  APM: not present
  GPT: not present

Creating new GPT entries in memory.
Disk /dev/loop0: 204800 sectors, 100.0 MiB
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): 648FF4E6-B762-47C1-97F6-D19295B4E178
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 204766
Partitions will be aligned on 2048-sector boundaries
Total free space is 204733 sectors (100.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name

... even after destroying the loop device:

root@omn:~# losetup -d /dev/loop0
root@omn:~# losetup -l
root@omn:~# fdisk -l openzfs-sample-pool.img
Disk openzfs-sample-pool.img: 100 MiB, 104857600 bytes, 204800 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

OK, I rechecked the image from Windows on Linux using gdisk. The GPT table was created with partition codes BF01 and BF07:

saukrs@omn:~$ gdisk -l /tmp/openzfs-sample-pool.img
GPT fdisk (gdisk) version 1.0.9

Partition table scan:
  MBR: protective
  BSD: not present
  APM: not present
  GPT: present

Found valid GPT with protective MBR; using GPT.
Disk /tmp/openzfs-sample-pool.img: 204800 sectors, 100.0 MiB
Sector size (logical): 512 bytes
Disk identifier (GUID): D08A4243-5817-EF11-BDD4-5FEB5555ACD2
Partition table holds up to 9 entries
Main partition table begins at sector 2 and ends at sector 4
First usable sector is 34, last usable sector is 204766
Partitions will be aligned on 2048-sector boundaries
Total free space is 4029 sectors (2.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048          186367   90.0 MiB    BF01  zfs-000069a000004ddb
   9          186368          202751   8.0 MiB     BF07

I also compared zpool get all sample-pool outputs from both OSes:

$ diff -ub zpool-get-all-sample-pool.windows10.txt zpool-get-all-sample-pool.linux.txt
--- zpool-get-all-sample-pool.windows10.txt     2024-05-21 14:09:14.126551200 +0300
+++ zpool-get-all-sample-pool.linux.txt 2024-05-21 14:08:48.046716800 +0300
@@ -3,7 +3,7 @@
 sample-pool  capacity                       0%                             -
 sample-pool  altroot                        -                              default
 sample-pool  health                         ONLINE                         -
-sample-pool  guid                           7996677649176440267            -
+sample-pool  guid                           8533994822764781258            -
 sample-pool  version                        -                              default
 sample-pool  bootfs                         -                              default
 sample-pool  delegation                     on                             default
@@ -14,7 +14,7 @@
 sample-pool  autoexpand                     off                            default
 sample-pool  dedupratio                     1.00x                          -
 sample-pool  free                           79.9M                          -
-sample-pool  allocated                      116K                           -
+sample-pool  allocated                      110K                           -
 sample-pool  readonly                       off                            -
 sample-pool  ashift                         0                              default
 sample-pool  comment                        -                              default
@@ -24,7 +24,7 @@
 sample-pool  leaked                         0                              -
 sample-pool  multihost                      off                            default
 sample-pool  checkpoint                     -                              -
-sample-pool  load_guid                      4038385019138522589            -
+sample-pool  load_guid                      13219985917342397409           -
 sample-pool  autotrim                       off                            default
 sample-pool  compatibility                  off                            default
 sample-pool  bcloneused                     0                              -
@@ -69,5 +69,3 @@
 sample-pool  feature@blake3                 enabled                        local
 sample-pool  feature@block_cloning          enabled                        local
 sample-pool  feature@vdev_zaps_v2           active                         local
-sample-pool  feature@redaction_list_spill   enabled                        local
-sample-pool  feature@raidz_expansion        enabled                        local

... and compared zfs get all sample-pool outputs too:

$ diff -ub zfs-get-all-sample-pool.windows10.txt zfs-get-all-sample-pool.linux.txt
--- zfs-get-all-sample-pool.windows10.txt       2024-05-21 14:18:52.495245400 +0300
+++ zfs-get-all-sample-pool.linux.txt   2024-05-21 14:15:13.013732300 +0300
@@ -1,11 +1,11 @@
 NAME         PROPERTY              VALUE                  SOURCE
 sample-pool  type                  filesystem             -
-sample-pool  creation              Tue May 21 12:55 2024  -
-sample-pool  used                  116K                   -
+sample-pool  creation              Tue May 21 14:06 2024  -
+sample-pool  used                  110K                   -
 sample-pool  available             39.9M                  -
-sample-pool  referenced            24.5K                  -
+sample-pool  referenced            24K                    -
 sample-pool  compressratio         1.00x                  -
-sample-pool  mounted               no                     -
+sample-pool  mounted               yes                    -
 sample-pool  quota                 none                   default
 sample-pool  reservation           none                   default
 sample-pool  recordsize            128K                   default
@@ -35,12 +35,12 @@
 sample-pool  sharesmb              off                    default
 sample-pool  refquota              none                   default
 sample-pool  refreservation        none                   default
-sample-pool  guid                  8106054128323860416    -
+sample-pool  guid                  10601684526528754965   -
 sample-pool  primarycache          all                    default
 sample-pool  secondarycache        all                    default
 sample-pool  usedbysnapshots       0B                     -
-sample-pool  usedbydataset         24.5K                  -
-sample-pool  usedbychildren        91.5K                  -
+sample-pool  usedbydataset         24K                    -
+sample-pool  usedbychildren        85.5K                  -
 sample-pool  usedbyrefreservation  0B                     -
 sample-pool  logbias               latency                default
 sample-pool  objsetid              54                     -
@@ -49,16 +49,16 @@
 sample-pool  sync                  standard               default
 sample-pool  dnodesize             legacy                 default
 sample-pool  refcompressratio      1.00x                  -
-sample-pool  written               24.5K                  -
-sample-pool  logicalused           43K                    -
-sample-pool  logicalreferenced     12.5K                  -
+sample-pool  written               24K                    -
+sample-pool  logicalused           40.5K                  -
+sample-pool  logicalreferenced     12K                    -
 sample-pool  volmode               default                default
 sample-pool  filesystem_limit      none                   default
 sample-pool  snapshot_limit        none                   default
 sample-pool  filesystem_count      none                   default
 sample-pool  snapshot_count        none                   default
 sample-pool  snapdev               hidden                 default
-sample-pool  acltype               nfsv4                  default
+sample-pool  acltype               off                    default
 sample-pool  context               none                   default
 sample-pool  fscontext             none                   default
 sample-pool  defcontext            none                   default
@@ -71,6 +71,3 @@
 sample-pool  keyformat             none                   default
 sample-pool  pbkdf2iters           0                      default
 sample-pool  special_small_blocks  0                      default
-sample-pool  prefetch              all                    default
-sample-pool  com.apple.mimic       off                    default
-sample-pool  driveletter           -                      default

Nothing special.

I wonder what makes OpenZFS to behave different between Linux and Windows with regard to GPT creation.

Then maybe that difference is not related to the original issue.

lundman commented 4 months ago

I'd be more interested to see if it makes a difference if you ran the commands from cmd/powershell started as Administrator. I can try MSYS tomorrow as well

sskras commented 4 months ago

Oh, I forgot that MSYS2 + gsudo might mangle some things along (like paths). Thanks for reminding.

Yes, it works indeed:

Microsoft Windows [Version 10.0.19044.3086]
(c) Microsoft Corporation. All rights reserved.

C:\Windows\system32>set PROMPT=$P$G

C:\Windows\system32> PATH=%PATH%;D:\Program Files\OSFMount;C:\Program Files\OpenZFS On Windows

C:\Windows\system32> D:

D:\> cd D:\Downloads

D:\Downloads> OSFMount.com -l
[logical disks]

[physical disks]

D:\Downloads> dir openzfs-sample-pool-2.img
 Volume in drive D is New Volume
 Volume Serial Number is F8C2-CC01

 Directory of D:\Downloads

21/05/2024  14:59       104,857,600 openzfs-sample-pool-2.img
               1 File(s)    104,857,600 bytes
               0 Dir(s)     323,813,376 bytes free

D:\Downloads> OSFMount.com -a -t file -f openzfs-sample-pool-2.img -o rw,physical
Creating device...
OK
Setting disk attributes...
Done.

D:\Downloads> OSFMount.com -l
[logical disks]

[physical disks]
\\.\PhysicalDrive2

D:\Downloads> zpool import
path '\\?\scsi#disk&ven_samsung&prod_ssd_870_evo_1tb#4&f78d1d6&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive0'
read partitions ok 2
    gpt 0: type a90d6fd4b0 off 0x4400 len 0xffbc00
    gpt 1: type a90d6fd4b0 off 0x1000000 len 0xe8dfc00000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 22:    len 7fde:    tag: 10    name: 'Microsoft reserved partition'
    part 1:  offset 8000:    len 746fe000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_nvme&prod_samsung_ssd_980#5&2eacfa49&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive1'
read partitions ok 5
    gpt 0: type a90d6fd4b0 off 0x100000 len 0x6400000
    gpt 1: type a90d6fd4b0 off 0x6500000 len 0x1000000
    gpt 2: type a90d6fd4b0 off 0x7500000 len 0x3b7760a600
    gpt 3: type a90d6fd4b0 off 0x3b7ec00000 len 0x1f900000
    gpt 4: type a90d6fd4b0 off 0x3b9e500000 len 0x38d2600000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 800:    len 32000:    tag: c    name: 'EFI system partition'
    part 1:  offset 32800:    len 8000:    tag: 10    name: 'Microsoft reserved partition'
    part 2:  offset 3a800:    len 1dbbb053:    tag: 11    name: 'Basic data partition'
    part 4:  offset 1dcf2800:    len 1c693000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_passmark&prod_osfdisk#1&2afd7d61&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive2'
read partitions ok 0
asking libefi to read label
no pools available to import

D:\Downloads> zpool create sample-pool-2 PhysicalDrive2
Expanded path to '\\?\PhysicalDrive2'
working on dev '#1048576#94371840#\\?\PhysicalDrive2'
setting path here '/dev/physicaldrive2'
setting physpath here '#1048576#94371840#\\?\PhysicalDrive2'

D:\Downloads> zpool status -v
  pool: sample-pool-2
 state: ONLINE
config:

        NAME              STATE     READ WRITE CKSUM
        sample-pool-2     ONLINE       0     0     0
          physicaldrive2  ONLINE       0     0     0

errors: No known data errors

D:\Downloads> zfs list
NAME            USED  AVAIL  REFER  MOUNTPOINT
sample-pool-2   116K  39.9M  27.5K  /sample-pool-2

D:\Downloads> zfs mount
sample-pool-2                   E:/

D:\Downloads> zpool export sample-pool-2
zunmount(sample-pool-2,E:/) running
zunmount(sample-pool-2,E:/) returns 0

D:\Downloads> zpool status
no pools available

D:\Downloads> 

What a waste of time! Now I need to find the offending difference (who changes what in MSYS2 scenario). Thanks again!

lundman commented 4 months ago

It is possible it wont let you write near the start of a device? To protect partitions?

sskras commented 4 months ago

I am postponing the research. This issue doesn't belong to @openzfsonwindows. Let's close it (at least for a while).

sskras commented 4 months ago

OK, I might take my words back.

Out of ~10 tries in elevated CMD one try failed with pool is busy. Out of ~6 tries in elevated MSYS2 one try succeeded.

Looks like I am hitting some race condition.

I tried to make output formats similar in both environments. When pool creation succeeded in both of them, I compared the outputs.

The last two differences in the output (not in the prompts) seem strange to me:

-/D/Downloads> zpool create sample-pool PhysicalDrive2
+D:\Downloads> zpool create sample-pool PhysicalDrive2
 Expanded path to '\\?\PhysicalDrive2'
 working on dev '#1048576#94371840#\\?\PhysicalDrive2'
 setting path here '/dev/physicaldrive2'
 setting physpath here '#1048576#94371840#\\?\PhysicalDrive2'

-/D/Downloads> zpool status
+D:\Downloads> zpool status
   pool: sample-pool
  state: ONLINE
 config:

         NAME              STATE     READ WRITE CKSUM
         sample-pool       ONLINE       0     0     0
           physicaldrive2  ONLINE       0     0     0

 errors: No known data errors

-/D/Downloads> zfs list
+D:\Downloads> zfs list
 NAME          USED  AVAIL  REFER  MOUNTPOINT
-sample-pool   117K  39.9M  28.5K  /sample-pool
+sample-pool  1022K  39.0M  27.5K  /sample-pool

-/D/Downloads> zfs mount
+D:\Downloads> zfs mount
 sample-pool                     E:/

-/D/Downloads> zpool list
+D:\Downloads> zpool list
 NAME          SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP    HEALTH  ALTROOT
-sample-pool    80M   117K  79.9M        -         -     3%     0%  1.00x    ONLINE  -
+sample-pool    80M  1022K  79.0M        -         -     3%     1%  1.00x    ONLINE  -

Why would filesystem use 8.7x more space (1022K) when being created in the elevated CMD (the green color) in comparison to the elevated MSYS2 (117K)?

@lundman, any ideas?

lundman commented 4 months ago

If it does not get mounted, ie the E: line, then it will not get Explorer/Windows doing it's think with a newly mounted filesystem. It gets pretty busy when mounted. All tests that do not show E: from zfs mount are not relevant. It will definitely say busy after that. The bug is in failing to mount it.

sskras commented 4 months ago

If it does not get mounted, ie the E: line, then it will not get Explorer/Windows doing it's think with a newly mounted filesystem. It gets pretty busy when mounted.

Fair enough. But as you see, it gets mounted. In my both runs (when mounting from CMD and MSYS2) E: mountpoint was created.

Still the usage of a newly created filesystem went from 1022K (green color, run in CMD) to 117K (red color, run in MSYS2). (Please excuse me if I wasn't clear enough in the comment above)

I thought this also might have to do with the possible race condition that I just noticed.

lundman commented 4 months ago

Ah I assume just one line with E: indicated the first failed, but you presented in coloured diff format.

Then yeah, does seem a bit like a race bug

lundman commented 4 months ago

Is it iffy from zpool create only, or also zpool import ?

sskras commented 4 months ago

If you are asking about the last two runs, then no – zpool import outputs are almost the same.

Full unified diff of outputs from MSYS2 and CMD
```diff $ diff -ub10 ZFS-on-{MSYS2,CMD}-v4.txt --- ZFS-on-MSYS2-v4.txt 2024-05-22 00:07:46.000866100 +0300 +++ ZFS-on-CMD-v4.txt 2024-05-22 00:07:15.993522200 +0300 @@ -1,118 +1,126 @@ -saukrs@DESKTOP-O7JE7JE MSYS ~ -# PS1='\n\[\e[33m\]\w\[\e[0m\]\[\e[1m\]>\[\e[0m\] ' +C:\Windows\system32>PROMPT=$P$G -~> cd /D/Downloads +C:\Windows\system32> cd D:\Downloads && D: -/D/Downloads> PATH=$PATH:/C/Program\ Files/OpenZFS\ On\ Windows:/D/Program\ Files/OSFMount +D:\Downloads> PATH=%PATH%;C:\Program Files\OpenZFS On Windows;D:\Program Files\OSFMount -/D/Downloads> OSFMount.com -l +D:\Downloads> PATH=%PATH%;C:\msys64\usr\bin + +D:\Downloads> OSFMount.com -l [logical disks] [physical disks] -/D/Downloads> dd if=/dev/zero of=./openzfs-sample-pool.img bs=1M count=100 +D:\Downloads> dd if=/dev/zero of=./openzfs-sample-pool.img bs=1M count=100 100+0 records in 100+0 records out -104857600 bytes (105 MB, 100 MiB) copied, 0.0391247 s, 2.7 GB/s +104857600 bytes (105 MB, 100 MiB) copied, 0.0414715 s, 2.5 GB/s -/D/Downloads> dir openzfs-sample-pool.img +D:\Downloads> dir /b openzfs-sample-pool.img openzfs-sample-pool.img -/D/Downloads> ll openzfs-sample-pool.img --rw-r--r-- 1 saukrs None 104857600 May 21 23:59 openzfs-sample-pool.img +D:\Downloads> dir openzfs-sample-pool.img + Volume in drive D is New Volume + Volume Serial Number is F8C2-CC01 + + Directory of D:\Downloads + +21/05/2024 23:52 104,857,600 openzfs-sample-pool.img + 1 File(s) 104,857,600 bytes + 0 Dir(s) 427,872,256 bytes free -/D/Downloads> OSFMount.com -a -t file -f openzfs-sample-pool.img -o rw,physical +D:\Downloads> OSFMount.com -a -t file -f openzfs-sample-pool.img -o rw,physical Creating device... OK Setting disk attributes... Done. -/D/Downloads> OSFMount.com -l +D:\Downloads> OSFMount.com -l [logical disks] [physical disks] \\.\PhysicalDrive2 -/D/Downloads> zpool import +D:\Downloads> zpool import path '\\?\scsi#disk&ven_samsung&prod_ssd_870_evo_1tb#4&f78d1d6&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}' and '\\?\PhysicalDrive0' read partitions ok 2 - gpt 0: type 76256fd7a0 off 0x4400 len 0xffbc00 - gpt 1: type 76256fd7a0 off 0x1000000 len 0xe8dfc00000 + gpt 0: type f9775adb80 off 0x4400 len 0xffbc00 + gpt 1: type f9775adb80 off 0x1000000 len 0xe8dfc00000 asking libefi to read label EFI read OK, max partitions 128 part 0: offset 22: len 7fde: tag: 10 name: 'Microsoft reserved partition' part 1: offset 8000: len 746fe000: tag: 11 name: 'Basic data partition' path '\\?\scsi#disk&ven_nvme&prod_samsung_ssd_980#5&2eacfa49&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}' and '\\?\PhysicalDrive1' read partitions ok 5 - gpt 0: type 76256fd7a0 off 0x100000 len 0x6400000 - gpt 1: type 76256fd7a0 off 0x6500000 len 0x1000000 - gpt 2: type 76256fd7a0 off 0x7500000 len 0x3b7760a600 - gpt 3: type 76256fd7a0 off 0x3b7ec00000 len 0x1f900000 - gpt 4: type 76256fd7a0 off 0x3b9e500000 len 0x38d2600000 + gpt 0: type f9775adb80 off 0x100000 len 0x6400000 + gpt 1: type f9775adb80 off 0x6500000 len 0x1000000 + gpt 2: type f9775adb80 off 0x7500000 len 0x3b7760a600 + gpt 3: type f9775adb80 off 0x3b7ec00000 len 0x1f900000 + gpt 4: type f9775adb80 off 0x3b9e500000 len 0x38d2600000 asking libefi to read label EFI read OK, max partitions 128 part 0: offset 800: len 32000: tag: c name: 'EFI system partition' part 1: offset 32800: len 8000: tag: 10 name: 'Microsoft reserved partition' part 2: offset 3a800: len 1dbbb053: tag: 11 name: 'Basic data partition' part 4: offset 1dcf2800: len 1c693000: tag: 11 name: 'Basic data partition' path '\\?\scsi#disk&ven_passmark&prod_osfdisk#1&2afd7d61&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}' and '\\?\PhysicalDrive2' read partitions ok 0 asking libefi to read label no pools available to import -/D/Downloads> zpool status +D:\Downloads> zpool status no pools available -/D/Downloads> zpool create sample-pool PhysicalDrive2 +D:\Downloads> zpool create sample-pool PhysicalDrive2 Expanded path to '\\?\PhysicalDrive2' working on dev '#1048576#94371840#\\?\PhysicalDrive2' setting path here '/dev/physicaldrive2' setting physpath here '#1048576#94371840#\\?\PhysicalDrive2' -/D/Downloads> zpool status +D:\Downloads> zpool status pool: sample-pool state: ONLINE config: NAME STATE READ WRITE CKSUM sample-pool ONLINE 0 0 0 physicaldrive2 ONLINE 0 0 0 errors: No known data errors -/D/Downloads> zfs list +D:\Downloads> zfs list NAME USED AVAIL REFER MOUNTPOINT -sample-pool 117K 39.9M 28.5K /sample-pool +sample-pool 1022K 39.0M 27.5K /sample-pool -/D/Downloads> zfs mount +D:\Downloads> zfs mount sample-pool E:/ -/D/Downloads> zpool list +D:\Downloads> zpool list NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT -sample-pool 80M 117K 79.9M - - 3% 0% 1.00x ONLINE - +sample-pool 80M 1022K 79.0M - - 3% 1% 1.00x ONLINE - -/D/Downloads> zpool export sample-pool +D:\Downloads> zpool export sample-pool zunmount(sample-pool,E:/) running zunmount(sample-pool,E:/) returns 0 -/D/Downloads> zfs version +D:\Downloads> zfs version zfswin-2.2.3rc4 zfs-kmod-zfswin-2.2.3rc4 -/D/Downloads> zpool status +D:\Downloads> zpool status no pools available -/D/Downloads> OSFMount.com -d -m 2 +D:\Downloads> OSFMount.com -d -m 2 -/D/Downloads> OSFMount.com -l +D:\Downloads> OSFMount.com -l [logical disks] [physical disks] -/D/Downloads> _ +D:\Downloads> _ ```

Attaching the output files: ZFS-on-CMD-v4.txt ZFS-on-MSYS2-v4.txt

sskras commented 4 months ago

I tried running zpool import sample-pool && zfs mount && zpool export sample-pool in a loop.

After a random iterations count (24, then 4, then 56, now 26) I see zpool import sample-pool crashing. Eg:

saukrs@DESKTOP-O7JE7JE MSYS ~
# time (unset N; set -e; for((;;)); do ((N=N+1)); echo -e "\nIteration $N:\n"; set -x; zpool import sample-pool && zfs mount && zpool export sample-pool; [[ $? != 0 ]] && break; set +x; done)
  ...
Iteration 56:

+ zpool import sample-pool
path '\\?\scsi#disk&ven_samsung&prod_ssd_870_evo_1tb#4&f78d1d6&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive0'
read partitions ok 2
    gpt 0: type 2ad0d2d460 off 0x4400 len 0xffbc00
    gpt 1: type 2ad0d2d460 off 0x1000000 len 0xe8dfc00000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 22:    len 7fde:    tag: 10    name: 'Microsoft reserved partition'
    part 1:  offset 8000:    len 746fe000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_nvme&prod_samsung_ssd_980#5&2eacfa49&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive1'
read partitions ok 5
    gpt 0: type 2ad0d2d460 off 0x100000 len 0x6400000
    gpt 1: type 2ad0d2d460 off 0x6500000 len 0x1000000
    gpt 2: type 2ad0d2d460 off 0x7500000 len 0x3b7760a600
    gpt 3: type 2ad0d2d460 off 0x3b7ec00000 len 0x1f900000
    gpt 4: type 2ad0d2d460 off 0x3b9e500000 len 0x38d2600000
asking libefi to read label
EFI read OK, max partitions 128
    part 0:  offset 800:    len 32000:    tag: c    name: 'EFI system partition'
    part 1:  offset 32800:    len 8000:    tag: 10    name: 'Microsoft reserved partition'
    part 2:  offset 3a800:    len 1dbbb053:    tag: 11    name: 'Basic data partition'
    part 4:  offset 1dcf2800:    len 1c693000:    tag: 11    name: 'Basic data partition'
path '\\?\scsi#disk&ven_passmark&prod_osfdisk#1&2afd7d61&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
 and '\\?\PhysicalDrive2'
read partitions ok 2
    gpt 0: type 2ad0d2d460 off 0x100000 len 0x5a00000
    gpt 1: type 2ad0d2d460 off 0x5b00000 len 0x800000
asking libefi to read label
EFI read OK, max partitions 9
    part 0:  offset 800:    len 2d000:    tag: 4    name: 'zfs-0000663a0000742a'
    part 8:  offset 2d800:    len 4000:    tag: b    name: ''
working on dev '#1048576#94371840#\\?\scsi#disk&ven_passmark&prod_osfdisk#1&2afd7d61&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
setting path here '/dev/physicaldrive2'
setting physpath here '#1048576#94371840#\\?\scsi#disk&ven_passmark&prod_osfdisk#1&2afd7d61&0&000000#{53f56307-b6bf-11d0-94f2-00a0c91efb8b}'
+ [[ 139 != 0 ]]
+ break

real    1m23.270s
user    0m0.908s
sys     0m3.812s

Segmentation fault was printed after setting physpath here ... once per four runs.

If I knew Windows Batch syntax good enough, I would run the loop here too :)

sskras commented 4 months ago

Oh well, maybe I am just too low on free RAM.

lundman commented 4 months ago

maybe some sleeps, so it doesnt run too fast :)

sskras commented 4 months ago

Looks like it was due to memory shortage.

I've run zpool create sample-pool PhysicalDrive2 && zfs mount && zpool destroy sample-pool in a loop in my Bash running on the elevated MSYS2. After 113 iterations everything is just fine.

So my guess is that on low RAM conditions something breaks silently. Either in kernel mode (openzfs.sys), or in the userland (zpool.exe or whatever).

lundman commented 4 months ago

Yeah, so - import might be ok, but zpool create sometimes doesn't mount correctly?

sskras commented 4 months ago

Yeah, so - import might be ok, but zpool create sometimes doesn't mount correctly?

I ran over 100 cycles of import+export and then as much cycles of create+destroy. All went like a breaze now that I killed Chrome with my 5276 open tabs:

image

... which was pushing RAM consumption near to 97%, that's free 241,124K out of 7,773,880K total:

image

I would like to stress test both zpool create ... and zpool import ... failures under such conditions when I have additional energy.

lundman commented 4 months ago

Ah ok wicked. Sounds a little less urgent at least. - excellent work.

sskras commented 4 months ago

Building the project on my own would help, I guess. At least setting up the debugging tools.

Reopening for the future.

sskras commented 4 months ago

I guess this tool could be used instead of the tediously running Chrome containing a gazillion of tabs:

Testlimit v5.24 by Mark Russinovich, Sysinternals Published: November 17, 2016