openzfsonosx / openzfs

OpenZFS on Linux and FreeBSD, and, macOS. This is where development for macOS happens.
https://github.com/openzfs/zfs/wiki
Other
145 stars 12 forks source link

apfs on zvol is considered as "mounted from a disk image" from the point of macOS #85

Open JMoVS opened 3 years ago

JMoVS commented 3 years ago

System information

Type Version/Name
Distribution Name macOS Big Sur
Distribution Version 11.4
Linux Kernel
Architecture M1 ARM
ZFS Version 2.1.0 rc1
SPL Version

Describe the problem you're observing

zvol based volumes for time machine aren't showing up as a target as they are, according to log stream --source --predicate 'subsystem == "com.apple.TimeMachine"' --style compact --info --debug`, mounted from a disk image.

Describe how to reproduce the problem

create a zvol, eg zfs create -s -V 1G pool/foo use diskutility to format foo as APFS (either with or without partition table)

Include any warning/errors/backtraces from the system logs

JMoVS commented 3 years ago

I think the relevant line in disk utility GUI is "connection type" which is "USB" for other disks and in the zvol case is "image"

JMoVS commented 3 years ago

on CLI, diskutil info, it might be related to protocol type USB vs Disk Image

JMoVS commented 3 years ago

it also shows up as "solid state disk", which in my case it clearly isn't - it's on a 100% HDD pool. So that could also be adjusted.

JMoVS commented 3 years ago

I think we have an answer to the code comment here: https://github.com/openzfsonosx/openzfs/blob/4696d5527d0a9a18dad4a9c3cd23300903fdfdaf/module/os/macos/zfs/zvolIO.cpp#L128

We should switch this from "file" to whatever non-file is called. Virtual is fine (virtual, synthesized etc) - but I think file corresponds to "Image" in Mac OS parler, so is this the only relevant place?

JMoVS commented 3 years ago

Looking into https://opensource.apple.com/source/IOStorageFamily/IOStorageFamily-260.100.1/IOStorageProtocolCharacteristics.h.auto.html

I see the following options for us to change to:

~According to Apple:~ Their documentation/code comments are messed up I think...

kIOPropertyInterconnectFileKey

his key defines the value of RAM for the key
kIOPropertyPhysicalInterconnectLocationKey. If the device is system memory
that is being represented as a storage device, this key should be set.

As a zvol doesn't live on files usually, it should be something else.

If one desires fancier options for the cases where the Zpool is in a file, that can be changed but in the general case, I think we should change to kIOPropertyInternalExternalKey

quoting:

This key defines the value of Internal/External for the key
kIOPropertyPhysicalInterconnectLocationKey. If the device is connected
to a bus and it is indeterminate whether it is internal or external,
this key should be set.
rottegift commented 3 years ago

it also shows up as "solid state disk", which in my case it clearly isn't - it's on a 100% HDD pool. So that could also be adjusted.

I don't object to making a mac-only property to control that, but I have doubts that it would make much difference.

Few filesystems that can be mounted by a Mac take real notice of whether the underlying media is rotational or solid state. APFS does, but not in ways practically anyone would want (apfs.util lets you do some extra things like defragment and you'll observe different cache numbers in apfs_stats -e). On solid state media HFS enables unmap handling and sets HFS_SSD which nerfs hfs_relocate(). The latter will in general only be relevant if you have booted from the HFS volume, or you are doing some low-level manipulation of the HFS journal. With the underlying copy-on-write, hfs_relocate() in a zvol is almost certainly counterproductive.

Probably the most relevant difference will be in whether zfs's zvol code sees unmaps/discards from xnu. We take those and dmu_free_long_range(), returning the trimmed/unmapped LBAs to the pool (ignoring snapshots), and that happens whatever the geometry or collection of devices are in the pool. (Of course, dmu_free_long_range() is a call which can also trigger autotrim, if enabled and supported on the leaf vdevs).

If the filesystem(s) inside the zvol do not send unmaps, then data they no longer use will not be shrink the zvol's USED property, and there will be additional r-m-w costs if those LBAs are subsequently overwritten by the filesystem inside. Secondarily, a read (via dd or ddrescue, and in some unusual uses of asr) will return whatever non-zeros were in the no-longer-used-by-the-filesystem blocks, because they were not unmapped/discarded.

If, hypothetically, APFS had strong differences in allocation heuristics based on whether the underlying media was solid state or not, on a ZFS machine with anything other than a tiny ARC you'd probably still prefer that APFS uses its solid state heuristics to take advantage of the zio pipeline and the various write aggregation mechanisms in zfs. If, even more hypothetically, read scheduling was strongly influenced by solid-vs-rotating in APFS, you'd probably still want to pretend seeks are cheap because asymptotically they will be for any but the smallest ARCs or the least cache-friendly workloads (in which case you probably want to actually use SSDs...). Even purely sequential I/O gains some benefit from zio (aggregation, nopwriting, etc), the vdev scheduler, and ARC (zfetching): nothing that macos can mount will think "aha, this is solid state media so I am deliberately going to break up this large write into small pieces so as to fill in the tiniest gaps scattered across some wide range of LBAs, even though I have lots of free space".

Of course, maybe you have found a workload where for APFS or JHFS+ it's great on a spinny disk and lousy on a solid slow state disk, and not-great on zfs. I'd sure be curious about a real example, and whether it could be replicated with e.g. https://fio.readthedocs.io/en/latest/fio_doc.html