Open abhijitshintre11 opened 5 years ago
hi @abhijitshintre11 , interesting that you make it work. Is the Infiniband EDR in "ethernet mode"? For you question, I think this is a new feature for xCAT which needs code changes, we will put this into our plan in the following sprints, we will update the status in this ticket so that you can trace.
We have several provision-over-IB attempts before but failed with some technical issues: 1) Infiniband EDR is not enabled in petitboot, so IB cannot interact with the deploy server with PXE 2) Infiniband EDR is not enabled in initrd by default, so the rootimg tarball cannot be downloaded over IB
I have several questions for you on this:
It will be much appreciated if you can provide the steps or Doc on this, so that this can be a new feature or reference case in xCAT, thanks
Hello @immarvin, Here are the answers to your questions
Below are the steps Management node s/w configuration
Node configuration:
Steps to be performed on the Management Node
Install Mellanox OFED and Configure IPoIB on the management node.
Define ib0 as your dhcpinterface
Generate diskless image with following parameters a. # genimage -i ib0 -n mlx5_ib,mlx4_ib,ib_ipoib centos7.4-x86_64-netboot-compute b. #packimage centos7.4-x86_64-netboot-compute
Define node:
Add kernel arguments to load the Infiniband drivers during netboot.
After running nodeset command, dhcpd.lease file is generated. Need to modify the lease file with actual 20-byte MAC address of IB by adding a line below fixed address parameter, shown below fixed-address 162.20.1.160 option dhcp-client-identifier= ff:00:00:00:00:00:02:00:00:02:c9:00:24:8a:07:03:00:a3:ee:2c;
Now need to restart dhcpd service to make the changes. Please note that the dhcpd.lease file will now get modified with the actual 20-byte MAC address.
Boot the node with Flexboot as its first boot option.
Attached herewith configuration files for your reference.
Please let me know if this works fine in your scenario.
Thanks
THANKS A LOT!! @abhijitshintre11
one question, why the step Disabled UEFI mode
is needed? is it mandatory? are UEFI and Flexboot exclusive options?
FYI, see my comments in https://github.com/xcat2/xcat2-task-management/issues/573
We did boot over infiniband using the 8 byte port guid and 6 byte 'fake ethernet' that mellanox does.
We did our testing using UEFI boot, but my understanding is the non-UEFI mode works in the same fashion.
Our instructions for OPA install are similar: https://hpc.lenovo.com/users/documentation/el7opainstall.html
But omnipath uses same hwaddr in pxe and os, and mellanox changes from 6 to 8 byte from firmware to OS, causing us to decide to sidestep with static address mode.
It's not mandatory to disable UEFI mode. To make it work in UEFI, need to burn appropriate UEFI firmware on IB card.
What do you think of the strategy of using the port guid rather than the 20 byte address?
To support that, all that's needed to accomodate omnipath is: https://github.com/xcat2/xcat-core/pull/5976/files
To support EDR IB, would need to either limit it to static addressing (already works today without changes to dhcp.pm) or extend dhcp.pm to put in multiple host declarations for the 6 and 8 byte forms to deal with difference between PXE and OS.
Hello, I need to define a node in xcat with two mac address so that if it doesn't boot with Infiniband it can boot with ethernet mac address?
Is there any modern solution for Infiniband booting? Without having to manually edit dhcpd.leases
?
I have not been using boot over IB in any of my test environments.
@samveen @kcgthb @banuchka: Are any of you booting over IB in any of your environments? If so, do you have any advice you can share with @viniciusferrao and the rest of the community?
I unfortunately don't have experience with booting nodes over IB, and always try to ensure that systems will have a simple Ethernet management network they can boot from.
I unfortunately don't have experience with booting nodes over IB, and always try to ensure that systems will have a simple Ethernet management network they can boot from.
Yeah, I also require a Ethernet card on my projects. But you know, I didn't specified this machine.
Well I will for now edit the DHCP file directly. 😢
Me neither. Jarrod (@jjohnson42 ) would probably be the best person to answer (he's been building xCAT since before I've been using it).
@jjohnson42 will probably says it supports on Confluent! Kidding, but may be actually true... 😂
Hello, I have provisioned a Diskless node using Infiniband EDR. I am able to boot the node using xCAT-2.14.5 version, but I faced a problem with defining MAC address. The MAC address for Infiniband interconnect is 20-byte. When I run nodeset command it throws an error (Invalid mac address). So I need to manually add its MAC address in dhcpd.lease file. Is there any other way in xCAT to define MAC address which is greater than 6-byte.
Thanks