machyve / xhyve

xhyve, a lightweight OS X virtualization solution
Other
6.44k stars 356 forks source link

Cannot read disk when using ZFS volume #111

Open jeduardo opened 8 years ago

jeduardo commented 8 years ago

I'm using OpenZFS with a pool created inside a USB drive.

When trying to install Ubuntu 16.04 over a ZFS volume it seems that xhyve cannot read the block device. The Ubuntu installer just tells me that no disk was detected. Using an actual (non-sparse) file hosted in the HFS+ partition in the same USB drive (but no ZFS) everything works fine.

When creating a volume and dd'ing a running image onto this volume, then pointing xhyve to the volume the boot hangs at the initramfs prompt and the disk is never detected. When pointing xhyve to the image in the hard drive, the disk is detected as vda and everything works perfectly.

I enabled pci_vtblk_debug under src/pci_virtio_block.c and I do get some interesting messages when running on both situations:

1) Block device, first attempt to read disk returns the following message and eventually Linux stop waiting for the disk:

virtio-block: read/ident op, 20 bytes, 1 segs

2) Disk image, first attempt to read disk returns the following message and it works fine for all session:

virtio-block: read/ident op, 4096 bytes, 1 segs

In situation 2 I get several times the message where 4096 are being read from the disk (sometimes with more bytes read).

Any ideas?

yangm97 commented 7 years ago

I think that is a Open ZFS issue, not a xhyve one. See https://github.com/openzfsonosx/zfs/issues/116

jeduardo commented 7 years ago

Hmm... don't think so. From what I could understand the issue you linked is related to the devices not being created correctly for the ZVOLs. In my case the devices are created correctly, in fact, if I format a ZVOL with a filesystem that macOS can read I can mount it just fine in the system.

However I could verify that I'm facing the same problem if I try to read/write to a USB device as well, so the problem might be with block devices in general, and not only the block device referencing the ZVOL. Does that make sense?

aphor commented 7 years ago

There is work to be done, apparently, before ZFS volumes will work as backing store src/block_if.c 497: perror("xhyve: raw device support unimplemented");

evansus commented 7 years ago

This is a missing feature in xhyve, not an openzfs issue. It's happening because xhyve does not support block (or char) devices. I'm working on a fix, will open a pull request and link here.

evansus commented 7 years ago

Actually it looks like this has been addressed but the PR is still open https://github.com/mist64/xhyve/pull/121

joshgoebel commented 5 years ago

https://dan.langille.org/2018/10/02/running-freebsd-on-osx-using-xhyve-a-port-of-bhyve/

I've installed FreeBSD following these instructions but can't ever get it to boot. Should I assume I'm hitting this ZFS issue?

The error I hit is always:

/boot/kernel/kernel text=0x1678a68 data=0x1cd288+0x768b40 ZFS: i/o error - all block copies unavailable

elf64_loadimage: read failed
can't load file '/boot/kernel/kernel': input/output error
Error while including /boot/menu.rc, in the line:
menu-display

Update: Seems to work with UFS, so I think the issue is indeed with ZFS or xhyve's support of ZFs.

aphor commented 5 years ago

This is probably a FreeBSD/BIOS issue.

https://groups.google.com/forum/#!topic/ml-freebsd-questions/3Smj5-m7o24

joshgoebel commented 5 years ago

Well, don't think xhyve really has much of a BIOS. I installed the same release (12) on Linode the same afternoon with full ZFS and zero issues booting. Just as another point of data.

joshgoebel commented 5 years ago

I didn't glean a lot from that until I followed the link at the very bottom and found on the next page someone who fixed it with:

# mkdir /tmp/mnt
# zpool import -R /tmp/mnt -f zroot
# cd /tmp/mnt
# mv boot boot.orig
# mkdir boot
# cd boot.orig
# cp -Rp * /tmp/mnt/boot
# zpool export
# reboot

Which really makes no sense unless it's something strange to do with where the file is placed on the media... hence making a copy moves it to a "more reliable" boot location."

aphor commented 5 years ago

The bootblock or EFI boot code will try to enumerate disks, using UEFI or BIOS routines. Then it will iterate through the disk, reading raw blocks from these, trying to find a bootable zfs pool and filesystem.

The installer works from the disks enumerated by kernel probes, so it's possible to install boot files on to a filesystem that the bootblock or UEFI boot code can't see. This might also skip a pool which hasn't been exported because it appears to be in use by another system.

The error in this issue suggests the bootblock or UEFI code finished scanning all the disks it could see, but didn't find a bootable pool/filesystem.

joshgoebel commented 5 years ago

The "boot code" in this case being "userboot", paired with the fbsd "firmware". It "half finds" the zfs pool. I can list files with ls and cat files with more from the mini-console... but the kernel won't boot. Yet if I boot to a working FreeBSD (the installer) I can import the ZFS pool with no issues at all. I haven't had a chance to try just copying the kernel to a new location - that would require me installing all over again from scratch.

drozdowsky commented 4 years ago

any progress? Same problem on my side

ebarriosjr commented 4 years ago

I got the same issue. Any updates on this?

aphor commented 4 years ago

There are three levels of zfs in FreeBSD as I understand it.

  1. low level BIOS bootblocks or UEFI bootstrap (read-only)
  2. the FreeBSD 2nd level bootloader zfs (read-only)
  3. the full FreeBSD kernel with ZFS drivers and utilities (pool import, rw)

Are you stuck at 1 (UEFI shell) or 2 (FreeBSD loader?)

dch commented 4 years ago

I can replicate this reliably, it's stuck at stage 2.

/boot/kernel/kernel text=0x168fdf1 data=0x1d0a68+0x768d80 ZFS: i/o error - all block copies unavailable

elf64_loadimage: read failed
can't load file '/boot/kernel/kernel': input/output error
Error while including /boot/menu.rc, in the line:
menu-display
\
Hit [Enter] to boot immediately, or any other key for command prompt.
Booting [/boot/kernel/kernel]...               
/boot/kernel/kernel text=0x168fdf1 data=0x1d0a68+0x768d80 ZFS: i/o error - all block copies unavailable

elf64_loadimage: read failed
can't load 'kernel'

OK lsdev
host devices:
    host0:   Host filesystem
disk devices:
    disk0:   Guest drive image
      disk0p1: EFI
      disk0p2: FreeBSD boot
      disk0p3: FreeBSD ZFS
zfs devices:
    zfs:zroot
OK lszfs zroot/ROOT
default
OK lszfs zroot/ROOT/default 
OK ls
/
 d  tmp
 d  usr
 d  var
 d  boot
 d  dev
    COPYRIGHT
 d  lib
 d  bin
 d  proc
 d  root
 d  net
 d  etc
 d  media
 d  rescue
    .profile
 d  mnt
    .cshrc
 l  sys
 d  sbin
 d  libexec
 l  home
    entropy
OK ls boot
boot
 d  zfs
    loader.4th
 d  defaults
    shortcuts.4th
    menu.4th
...

This used to work very reliably ~ 3+ years ago, using https://hackmd.io/2K1RyRiQQ46aG-rDH3ps2Q

The issue is repeatable with gpt,mbr format disks, and uefi or bios startup, all with zfs.

I will fiddle with a variety of loader versions to see if we can sneak by this problem.