timberland-sig / edk2

EDK II fork for Timberland-SIG POC
https://github.com/tianocore/tianocore.github.io/wiki/EDK-II
Other
3 stars 2 forks source link

IPv6 connection issues #38

Open tbzatek opened 4 months ago

tbzatek commented 4 months ago

I'm having trouble getting discovery from (static) IPv6 working during the pre-OS phase. Tested with the current timberland_upstream-dev-full_nbft-population-fixes branch (#35). For my setup and the resulting NBFT table, please see https://github.com/linux-nvme/libnvme/pull/821.

Taking the second Discovery Descriptor URI nvme+tcp://[4321::BBBB:1]:4420/ I have no problem reaching it from linux with networking corresponding with the HFI Descriptor records.

This looks like a timeout, noticed the EFI boot process being stuck for a minute or two, with qemu eating 100% CPU, eventually booting from the first (IPv4) boot attempt. Might be related to a lost Host address prefix as reported in #37.

tbzatek commented 4 months ago

This is not limited to discovery from IPv6, tested with a specific subsysnqn - same problem. Looks like a general EFI networking stack issue. Thankfully the failed boot attempt is still recorded as an SSNS record, marked unavailable and nvme-cli still connects fine:

Apr 26 14:30:28 localhost.localdomain kernel: nvme nvme0: new ctrl: NQN "nqn.2014-08.org.nvmexpress.boot.poc:test-target", addr 192.168.122.1:4420
Apr 26 14:30:28 localhost.localdomain kernel: nvme nvme1: new ctrl: NQN "nqn.2014-08.org.nvmexpress.boot.poc:test-target", addr [4321:0000:0000:0000:0000:0000:bbbb:0001]:4420
      {
        "index":5,
        "num_hfis":1,
        "hfis":[
          2
        ],
        "transport":"tcp",
        "traddr":"4321::bbbb:1",
        "trsvcid":"4420",
        "subsys_port_id":0,
        "nsid":0,
        "nid":"",
        "subsys_nqn":"nqn.2014-08.org.nvmexpress.boot.poc:test-target",
        "controller_id":0,
        "asqsz":0,
        "pdu_header_digest_required":0,
        "data_digest_required":0,
        "discovered":0,
        "unavailable":1
      }
trevor-cockrell commented 3 months ago

I believe this is due to a few things --

At this time, you should be able to connect via IPv6 if you configure the NIC's IPv6 settings prior to attempting a target connection - set desired interface to manual and configure w/a valid IPv6 address+gateway OR configure as auto with a valid DHCP6 server set up.

I'm looking into a resolution that will allow IPv6 configuration from the nvmeof HII without needing to configure the interface outside of the nvmeof menu.

tbzatek commented 3 months ago

Thanks Trevor. I've made another attempt and configured IPv6 addresses for the second network interface first (Device Manager -> Network Device List -> IPv6 Network Configuration -> Host addresses and Route Table), however it seems to have no effect. Still couldn't get the initiatior working. Also tried with clean efivars.

trevor-cockrell commented 3 months ago

How are you setting up/providing NICs for your qemu invocation? I had a lot of trouble with qemu's ip6 networking until I set up a TAP for qemu.. I think I also had to forward ipv6 via sysctl.

tbzatek commented 2 months ago

How are you setting up/providing NICs for your qemu invocation?

My qemu network is very simple:

--netdev bridge,id=net0,br=virbr0 --device virtio-net-pci,netdev=net0,mac=52:54:00:72:c5:ae
--netdev bridge,id=net1,br=virbr1 --device virtio-net-pci,netdev=net1,mac=52:54:00:72:c5:af

This way qemu defaults to creating tap interfaces, adding them to the target bridges. Each bridge on the host has its own address in an isolated subnet and kernel nvme target is bound to that. Sysctl net.ipv6.conf.all.forwarding is set to 1, no firewall enabled.

The thing is that IPv6 works fine in linux and even performs discovery from the NBFT discovery record just fine. Obviously network stacks are different, would be good to get some logs or diagnostics information from the UEFI side.

The /0 prefix size issue for the second HFI record still persists:

    "hfi":[
...
      {
        "index":2,
        "transport":"tcp",
        "pcidev":"0:0:4.0",
        "mac_addr":"52:54:00:72:c5:af",
        "vlan":0,
        "ip_origin":1,
        "ipaddr":"4321::bbbb:2",
        "subnet_mask_prefix":0,
        "gateway_ipaddr":"::",
        "route_metric":0,
        "primary_dns_ipaddr":"::",
        "secondary_dns_ipaddr":"::",
        "dhcp_server_ipaddr":"",
        "this_hfi_is_default_route":1,
        "dhcp_override":0
      }
    ],
tbzatek commented 2 months ago

Hmmm, after lots of (other) testing, this looks like an issue on the linux kernel target side (kernel 6.8.1). After resetting (clearing and setting up) the linux target, UEFI connections are immediate and successful. It is after the guest VM reboot when timeouts are observed. Same issue when powering off the VM and starting again. The nvmet keepalive timeout is set to 5 seconds and I can see the old connections expiring, still no luck even after waiting a while.

Needs to be retested against some other NVMe/TCP target.

Tested also kernel 6.9.6, no difference.