xapi-project / xcp-networkd

The XCP networking daemon
Other
14 stars 42 forks source link

CA-351826 recognise link/infiniband MAC address #185

Closed lindig closed 3 years ago

lindig commented 3 years ago

xcp-networkd fails to recognise the MAC address of an infiniband NIC because it only looks for link/ether and misses link/infiniband. If the former fails, look for link/infiniband, too.

This code could be made more general by using regular expressions for search rather than strings. However, this would create more changes which could complicate backports.

This is in response to https://github.com/xapi-project/xcp-networkd/issues/184

Signed-off-by: Christian Lindig christian.lindig@citrix.com

lindig commented 3 years ago

I am not too happy with this PR:

robhoes commented 3 years ago

I'm not certain yet if we really want to have PIFs for Infiniband devices (I'm not familiar enough with them). @Oleszkiewicz, do you have an opinion on this?

One option is to treat them the same as ethernet devices, and have them as PIFs, which will happen with the current patch. But then we need to know that nothing goes wrong due to the different format of the the MAC address, as well as other operations that can be performed on PIFs. This would need testing.

The safer option is to raise an appropriate error in Interface.get_mac if there is no ethernet address. This error needs to be caught, and the device excluded, at least in this function, which is used as part of PIF.scan: https://github.com/xapi-project/xen-api/blob/master/ocaml/xapi/xapi_pif.ml#L366.

lindig commented 3 years ago

We have no official support for Infiniband but we also should not crash when we encounter them while scanning for NICs. I agree with @robhoes that the current patch is probably not good enough as it accepts the card like it were an ethernet card.

Oleszkiewicz commented 3 years ago

Actually from my point of view :

1) Being able to make the IB card available to guests via SR-IOV is a really usefull thing, I guess we need a pif for that?

2) We will not be able to assign IB card to a guest in any other way than SR-IOV as it will not work with ETH bridges, last Mellanox drivers also dropped support for a virtual ETH device over IPoIB.

I guess it is not a bad thing for the hypervisor to be able to see the state of the interface too. Is it possible to see it as a PIF however disallow adding the interface to a guest system in the way other than SR-IOV? Passing the device via SR-IOV should work well as it is done on PCIe level rather than networking stack level.

Oleszkiewicz commented 3 years ago

The reasons for allowing SR-IOV based IB in guests are mostly: 1) Allow to run an SDS node in a guest without noticeable performance loss (Storage passed through on PCIe directly or via SR-IOV if hardware allows it, networking via SR-IOV. 2) Allow to run a guest with super fast access to shared storage without hypervisor overhead - for example using NVMeOF, the additional latency vs bare metal directly attached NVMe is a single digit in microseconds.

Actually guest access to an SDS node hosted locally on the same machine could be faster via SR-IOV/RDMA than going through the hypervisor.

robhoes commented 3 years ago

It looks extending support for Infiniband devices to the SR-IOV functionality in the API, and to block such PIFs from being used in other ways, is not trivial. At least for now, we should raise a better error here, and handle it by excluding the devices in PIF.scan.

Oleszkiewicz commented 3 years ago

One option is to treat them the same as ethernet devices, and have them as PIFs, which will happen with the current patch. But then we need to know that nothing goes wrong due to the different format of the the MAC address, as well as other operations that can be performed on PIFs. This would need testing.

At least in old versions (pre 8.0) with IB interfaces discovered as PIFs - this did not create any immediate problems, I guess trying to add one to a guest might fail - did not try this - however this would not be a major problem even if it is not not "disallowed to try", the operation should just fail I guess as we wouldn't be able to add interface to the bridge. The potential benfits though - especially with SR-IOV available, are quite huge.

Oleszkiewicz commented 3 years ago

It looks extending support for Infiniband devices to the SR-IOV functionality in the API, and to block such PIFs from being used in other ways, is not trivial. At least for now, we should raise a better error here, and handle it by excluding the devices in PIF.scan.

Is anything special needed for IB to work with SR-IOV? It is passed through on the PCIe level isn't it?

Oleszkiewicz commented 3 years ago

Actually if I get a package for 8.2 with the current patch I would be very eager to test it, verify if SR-IOV works properly without further changes (I guess it should) and if we don't have any hard problems/crashes when trying to add IB PIF to a guest.

robhoes commented 3 years ago

Is anything special needed for IB to work with SR-IOV? It is passed through on the PCIe level isn't it?

It depends how those devices are configured for SR-IOV. If the drivers expose the same sysfs nodes as ethernet devices do, then it may work. Otherwise, there is always the option to manually pass the PCI devices through to the guest using other_config:pci=<0000:xx:xx.x> on the VM.

lindig commented 3 years ago

I am building the current patch on CH 8.2 and would be happy to share the package if you would like to test it. If we want to ignore IB devices, I believe we need to do this in

As information is gathered piecemeal and we would corrupt our data structures if we fail to obtain the MAC address of a device that we otherwise already added.

Oleszkiewicz commented 3 years ago

Is anything special needed for IB to work with SR-IOV? It is passed through on the PCIe level isn't it?

It depends how those devices are configured for SR-IOV. If the drivers expose the same sysfs nodes as ethernet devices do, then it may work. Otherwise, there is always the option to manually pass the PCI devices through to the guest using other_config:pci=<0000:xx:xx.x> on the VM.

it seems that almost all the nodes that are present on an eth device are present in an ib too

/sys/class/net/ifname/qos is missing in IB

Oleszkiewicz commented 3 years ago

I am building the current patch on CH 8.2 and would be happy to share the package if you would like to test it. If we want to ignore IB devices, I believe we need to do this in

As information is gathered piecemeal and we would corrupt our data structures if we fail to obtain the MAC address of a device that we otherwise already added.

I would be super happy to test it. Even if we may pass VFs to guests manually I believe that the existing SR-IOV infrastructure tracks the number of available VFs, and makes it generally easier and less error-prone to provision.

Oleszkiewicz commented 3 years ago

I am building the current patch on CH 8.2 and would be happy to share the package if you would like to test it. If we want to ignore IB devices, I believe we need to do this in

As information is gathered piecemeal and we would corrupt our data structures if we fail to obtain the MAC address of a device that we otherwise already added.

How can I get the rpm when you build it?

lindig commented 3 years ago

Send me an email at christian.lindig@citrix.com

Oleszkiewicz commented 3 years ago

First good news - on current patch interface detection works OK and I can see ib0 interface, no errors here. I will reconfigure the server for SR-IOV now and test with that.

lindig commented 3 years ago

We can ask the open source community for help here to explore a good compromise as we are unlikely to do more than fixing the problem by skipping over IB devices in the PIF scan.

Oleszkiewicz commented 3 years ago

Ok, second run of testing done the results are as follows:

SR-IOV network creation on IB NIC = works SR-IOV network creation on ETH (while IB port present) = works

IB interface is shown as "disconnected", same for IB SR-IOV network

now the problems: It is not possible to start a guest with an ETH SR-IOV network assigned (on a dual port card with one port working on IB, another on ETH) as I get the following error:

"Failed","Starting VM 'CentOS 8 SRIOV-Test' Internal error: xenopsd internal error: (Failure "Failed to configure network SR-IOV VF (pci:0000:03:00.1) with mac=76:7c:4c:e9:f8:4b vlan=none rate=none: Failed to set VF MAC for: ib0")

This happens BOTH If I have IB PIF in the system and if I do xe pif-forget so I don't have an IB PIF, yet the driver will push both ports to the guest and we try to set ETH mac on IB interface which is not supported of course.

Actually looking at this I tend to agree with @robhoes that maybe just ignoring IB interface and not adding a PIF for it would be a good solution, yet we should also not try to assign it a MAC if a VF with an IB port is present in the guest..

lindig commented 3 years ago

The code in https://github.com/xapi-project/xcp-networkd/blob/master/lib/network_utils.ml#L187 scans /sys/class/net/*/device/driver and rejects only devices that have "xen-backend" in the path. I think this is too permissive. I wonder about the strategy such that we exclude infiniband devices.

Oleszkiewicz commented 3 years ago

The code in https://github.com/xapi-project/xcp-networkd/blob/master/lib/network_utils.ml#L187 scans /sys/class/net/*/device/driver and rejects only devices that have "xen-backend" in the path. I think this is too permissive. I wonder about the strategy such that we exclude infiniband devices.

would this cause xen not trying to assign a MAC address to IB interface when adding an ETH SR-IOV network to the host on a dual port card?

lindig commented 3 years ago

I have updated the code with a new method to reliably skip over Infiniband devices but still need to test this.

Oleszkiewicz commented 3 years ago

I have updated the code with a new method to reliably skip over Infiniband devices but still need to test this.

If you provide me with 8.2 package I can test it quite rapidly I believe. (especially the use case that failed with previous patch)

Oleszkiewicz commented 3 years ago

This should skip over the device entirely - no PIF entry will be made for it.

OK, but the SR-IOV on a dualport card failed on assigning mac to IB interface even if there was no PIF entry for it (when I have issued pif-forget on the ib0)

lindig commented 3 years ago

I've send you an RPM with this patch to try.

Oleszkiewicz commented 3 years ago

I've send you an RPM with this patch to try.

The scan completes properly - ib interface is not found, ethernet is found properly.

Starting a guest with SR-IOV network on ETH port of dualport card fails with:

"Failed","Starting VM 'CentOS 8 SRIOV-Test' Internal error: xenopsd internal error: (Failure "Failed to configure network SR-IOV VF (pci:0000:03:00.1) with mac=76:7c:4c:e9:f8:4b vlan=none rate=none: Failed to set VF MAC for: ib0") Time: 00:00:09","Fenrir.krupnicza.dc.xpand.pl","Feb 23, 2021 3:32 PM"

so we still try to assign a MAC address to an IB port on the VF that is assigned to the guest... The NIC is dualport and the VF always includes both ports of the NIC, we should not touch the IB port if present when starting the guest.

lindig commented 3 years ago

I'm not sure that I will get to the bottom to this without having access to such cards.

Oleszkiewicz commented 3 years ago

I'm not sure that I will get to the bottom to this without having access to such cards.

if required I can give you full acceess to the testing server I test it with for som time. Yet I guess the thing is to avoid trying to touch / update non eth interfaces on guest starting?

Oleszkiewicz commented 3 years ago

one more thought, does the current code actually take dualport cards into consideration for SR-IOV ?

At least with Mellanox cards, when I assign a VF to a guest it will always give the guest both ports of the card (with the same PCI id) Even when both ports are ETH we should actually set different MACs on them. I can check if it works properly later, however it would be great if we could first solve the IB/mac assignment problem as reconfiguring the server to eth/eth and back to ib/eth takes quite a time :)

lindig commented 3 years ago

I am planning to update this PR but not in a fundamental way and therefore it won't make IB work but just avoid crashing during the scan. If this is important, the open source community might be able to help.

Oleszkiewicz commented 3 years ago

I am planning to update this PR but not in a fundamental way and therefore it won't make IB work but just avoid crashing during the scan. If this is important, the open source community might be able to help.

If we avoid crash during starting a VM with an SR-IOV dualport card in (IB,ETH) configuration this should be OK I guess. The crash is avoided during the scan with both your patches I have tested (the first one that added IB as a PIF and the second one that just ignores IB during scan).

The last patch I have tested doesn't allow to start a VM with SR-IOV ETH port when the second port is IB. Can you possibly ignore IB / non-ethernet when starting SR-IOV enabled VM, especially by not trying to assign an ETH MAC address to non-ETH ports ?

Without it - SR-IOV is still broken for ETH when IB port is merely present.

Oleszkiewicz commented 3 years ago

actually if we can avoid the crash on starting a VM (not only the crash at pif-scan) the IB can be "made to work" manually without xen knowing about it, and starting from that maybe the open source patch improving the "quality of life" and making it visible could be viable. However I believe that also commercial / Citrix version should not crash with a non-eth port just being present :).

Oleszkiewicz commented 3 years ago

should I test the last set of patches? Would be happy to see if it ignores IB properly if I get an 8.2 rpm.

Oleszkiewicz commented 3 years ago

The last patch still crashes on starting a VM with SR-IOV Ethernet network wen IB interface is present in the system,

It still tries to assign MAC address to IB interface on VM creation, I guess you do not check if the interface is ETH in one more place:

"Failed","Starting VM 'CentOS 8 SRIOV-Test' Internal error: xenopsd internal error: (Failure "Failed to configure network SR-IOV VF (pci:0000:03:00.1) with mac=76:7c:4c:e9:f8:4b vlan=none rate=none: Failed to set VF MAC for: ib0") Time: 00:00:09","Fenrir.krupnicza.dc.xpand.pl","Mar 1, 2021 6:38 PM"

There is no IB PIF in the system.

the (pci:0000:03:00.1) is an SR-IOV VF on a dualport card with P1=IB and P2=ETH

What is expected: Ignoring IB port, setting MAC on ETH port.

lindig commented 3 years ago

We don't have the capacity or hardware to investigate this further as it is an unsupported configuration, I am sorry. Crucially, we also lack the hardware to test this. If there is enough interest, https://github.com/xcp-ng might be able to help.

Oleszkiewicz commented 3 years ago

Sad, as there is one last problem to solve just by ignoring non-eth interfaces on macc assignment while starting a vm with sr-iov network assigned, and I guess you know the codebase the best here. As mentioned before I could also do the testing and/or give full remote sccess to the hardware config that is affected.

I understand that adding code to fully/correctly support ib is out of the question here, but I hoped that we could come to a solution together to at least ignore non-eth ports well to avoid this and potentially other problems.

stormi commented 3 years ago

We have not planned resources on fixing the remaining issue either, but I'd like to know what would remain to be done (approximately). Is it likely, as @Oleszkiewicz suggests, that we'd "just" have to find another place where to ignore IB interfaces, by not trying to assign a MAC address to the IB interface? Or is it likely to be more complicated than that?

Oleszkiewicz commented 3 years ago

I assume it is just that, we don’t want IB support at this point - just proper ignoring IB when it is present.

Currently when we start a VM with SRIOV interface that includes both ETH (available as a PIF) and IB (ignored in detection, not visible as a PIF) interfaces xen tries to assign ETH MAC address to IB interface and fails -> VM does not start at all.

If we ignore IB properly here – we should be able to start a VM in this case.

From: Samuel VERSCHELDE @.> Sent: Thursday, April 8, 2021 6:52 PM To: xapi-project/xcp-networkd @.> Cc: Piotr Oleszkiewicz @.>; Mention @.> Subject: Re: [xapi-project/xcp-networkd] CA-351826 recognise link/infiniband MAC address (#185)

We have not planned resources on fixing the remaining issue either, but I'd like to know what would remain to be done (approximately). Is it likely, as @Oleszkiewiczhttps://github.com/Oleszkiewicz suggests, that we'd "just" have to find another place where to ignore IB interfaces, by not trying to assign a MAC address to the IB interface? Or is it likely to be more complicated than that?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/xapi-project/xcp-networkd/pull/185#issuecomment-815980775, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJSLSVBIA6UUZPKDYJTBZJDTHXNJJANCNFSM4XQ5EHJA.