xcp-ng / xcp

Entry point for issues and wiki. Also contains some scripts and sources.
https://xcp-ng.org
1.27k stars 74 forks source link

TrueNAS guest iSCSI connectivity issue #538

Open hammer-83 opened 2 years ago

hammer-83 commented 2 years ago

First of all, I would like to say that I've spent several days debugging the issue and trying many different things without figuring out what's the cause so if it is not with XCP-NG but with TrueNAS, I apologize in advance and will file the ticket in their issue tracker.

The issue I'm having is that any time that an initiator attempts to mount an iscsi target, TrueNAS VM loses all network connectivity. There are two ways to restore it: xe vif-unplug, xe vif-plug or guest reboot. There are no anomalies to report prior to mounting the iSCSI target: network works, speeds are good, iSCSI discovery works fine. But as soon as target is mounted, the network interface on the TrueNAS VM seems to have all the routes cleared for that interface. Here are the things I tried:

I also tried to replicate the exact same setup of TrueNAS and an Ubuntu VM on internal network in VirtualBox, where everything works fine. This is what lead me to choose the hypervisor rather than TrueNAS as a project to report the issue to.

Finally, I found the following two threads on the net of users having the exact same issue but without any resolution:

https://forums.lawrencesystems.com/t/dropping-connection-xcp-ng-with-free-truenas-as-guest/6720/6 https://www.truenas.com/community/threads/network-loses-connectivity-when-iscsi-target-connected.87251/

There seems to be one walkaround which is to bridge the virtual network to a physical adaptor but ideally I would like to keep this traffic internal so that: 1. it stays private, 2. it is not limited by the physical port speed.

My XCP-NG is version 8.2 with all the updates applied as of the time of this writing. I can provide additional details as needed, just not sure what else might help here.

stormi commented 2 years ago

Hi! You might reach more people that could potentially help on the forum: https://xcp-ng.org/forum/. Maybe there already exist old threads about it? I haven't checked.

hammer-83 commented 2 years ago

Hi, thank you for the suggestion. I can (and will) ask, but I'm also fairly certain it is a bug. I've scouted all the potential resources on the net for the past week including the XCP-NG forum. The only two threads on the subject I could find are the ones I listed in the original post and their problem description is exactly the same. So I would have really liked a dev to take a look at the problem if possible because at least for me it is 100% reproducible. What I do not know is if it's my hardware, Xen, XCP-NG or TrueNAS.

High-level steps to reproduce:

  1. Start off with a fresh XCP-NG 8.2, apply latest updates (this is for simplicity, but an existing XCP-NG will also do just fine)
  2. Create a new private network
  3. Create a new VM from TrueNAS 12.0-U7 ISO. Connect it only to the private network created in step 2. Need to make sure that the VM has at least two virtual disks, one for OS and another for storage.
  4. Once installed, from the VM console, configure a static IP on the TrueNAS VM
  5. Create a new VM from the latest Ubuntu Desktop ISO. Connect it to an external network for internet access and to the same private network from step 2.
  6. Once installed, set a static IP in the Ubuntu VM for the private network. Also install the package open-iscsi.
  7. From the Ubuntu VM, use the web browser to connect to TrueNAS. Configure a ZFS volume on the secondary virtual disk.
  8. Still in TrueNAS UI, activate the iSCSI service and configure the iSCSI target exposing the created ZFS volume as a block device.
  9. Open a terminal in Ubuntu VM and try to discover the iSCSI target:
    iscsiadm --mode discoverydb --type sendtargets --portal [TrueNAS IP]:3260 --discover

    at this step it should show the targets from TrueNAS - so far so good

  10. Now try to connect to the target:
    iscsiadm --mode node --targetname <target IQN> --portal [TrueNAS IP]:3260 --login

    In dmesg, you see the connection is successful and new block device is created in /dev. But after 5 seconds, TrueNAS console says no ping reply (NOP-Out) after 5 seconds: dropping connections and the network on TrueNAS becomes inaccessible. Have to do xe vif-unplug, xe vif-plug on the TrueNAS vif to get the connectivity back. The exact same set of steps in VirtualBox work fine.

I'm not a hard-core expert in networking or in low-level OS stuff, but I do work in the industry and do my fair share of software/hardware troubleshooting. So I can perform technical steps to help debug this if necessary. I just do not know myself how to approach it. But if somebody more knowledgeable hints me where to start digging, I'm ready to do it to get to the bottom of this.

stormi commented 2 years ago

The community on the forum can help debugging the issue, which would raise the likeliness of a fix if there's a bug.

niklasha commented 9 months ago

I came here since I have the exact same problem, while testing TrueNAS for future use. (I won't have it as a VM then, but during testing it is of course an easy route). I can add that the problem seems to be due to page crossing in xenvif_count_requests, in xen-netback/netback.c:

[318644.585621] vif vif-14-2 vif14.2: Cross page boundary, txp->offset: 0, size: 8900 [318644.585635] vif vif-14-2 vif14.2: fatal error; disabling device

Edit: the problem is in FreeBSD 13's netfront. It is fixed in 14, but TrueNAS core does not seem to go there. https://github.com/freebsd/freebsd-src/commit/dabb3db7a817f003af3f89c965ba369c67fc4910