Open joachimweyl opened 5 days ago
While partition keys are analogous to VLAN IDs I don't know how the rest of the things work. Like, could someone just setup a subnetmanager on their host and make changes to the IB fabric?
I'd really like to make sure that isolation is guaranteed on the IB networks before trying to do any of the hacky work in ESI.
@naved001 I'm not sure, we'll need to test that once we install the borrowed the switch.
We can borrow an unmanaged EDR IB switch from UMass (Mellanox SB7790). Because it is unmanaged we'll need an external subnet manager, which we can just host on an ESI node for testing purposes.
In the meantime, here's the parameters of what I think can be done for this in ESI, as well as some considerations:
segmentation_id
that neutron uses for vlans is an integer field. There are two ways around this that I can think of:
create_vlan
is run on all switches. So both the new Infiniband playbooks and the existing playbooks would have to be updated to somehow ignore these commands if intended for the other network typeJust some of my thoughts, I'd be interested in any discussion!
Motivation
We will need to have our H100s able to shift from OpenStack to OpenShift to BM. Currently our most promising way to do this is ESI. there are some hurtles to get past before we can do this so we are going to track those here and discuss ways to get over them.
Completion Criteria
A plan set in place to ensure H100s will be able to be moved between all of our offerings.
Description
Discussion
Completion dates
Desired - 2024-07-10 Required - TBD