opiproject / opi-prov-life

Provisioning, Lifecycle and Platform Management Group
Apache License 2.0
18 stars 26 forks source link

Update COORDINATION.md with Marvell review comments #216

Open sburla-marvell opened 1 year ago

sburla-marvell commented 1 year ago

Add Marvell comments and CRS details

glimchb commented 1 year ago

@ballle98 @RezaBacchus @dandaly @Gal-Zaidman please review

dandaly commented 1 year ago

Hi, I can merge the option 2 feedback into another PR:

RezaBacchus commented 1 year ago

Greetings Dan: The abstraction you proposed is highly desirable, however, OROMs run at the end of POST and this conflicts with the need for the OROM to run when the card is discovered, so that it may stall UEFI until all the Bus/Dev/Fun on the card are discoverable. Re writing UEFI to make the OROM run when the card is discovered is a heavy lift and risky because the system is not stable during descovery.

sburla-marvell commented 1 year ago

Thanks @sburla-marvell great comments. Do yiu want to merge this content or just collect feedback? Let me know

Most of it is just feedback. I can create a separate pull request for the CRS part if needed.

glimchb commented 1 year ago

Most of it is just feedback. I can create a separate pull request for the CRS part if needed.

yes, please do, so I can merge the CRS part

dandaly commented 1 year ago

Greetings Dan: The abstraction you proposed is highly desirable, however, OROMs run at the end of POST and this conflicts with the need for the OROM to run when the card is discovered, so that it may stall UEFI until all the Bus/Dev/Fun on the card are discoverable. Re writing UEFI to make the OROM run when the card is discovered is a heavy lift and risky because the system is not stable during descovery.

Hi Reza, This OROM solution solves the coordination problem between a PCI compatible device and the host. If the device is not PCI compatible, you need some other solution. This driver based solution is working today, it doesn't require any UEFI changes. I think we need to split Coordination between solving these two separate problems:

glimchb commented 1 year ago

This way we don't gate IPU/DPU provisioning on requiring a new PCI, UEFI or BIOS change in order for it to work the OPI way. We can also specify the OPI way to workaround specific incompatibilities, whatever they may be.

we should try to find a way that solves all vendors. if we can't, we can't... but we need to try first...

dandaly commented 1 year ago

This way we don't gate IPU/DPU provisioning on requiring a new PCI, UEFI or BIOS change in order for it to work the OPI way. We can also specify the OPI way to workaround specific incompatibilities, whatever they may be.

we should try to find a way that solves all vendors. if we can't, we can't... but we need to try first...

I propose we answer the OPI provisioning question within the constraints of what's existing, and see where we land. I also haven't heard from any other vendor that they can't work within the PCI spec, or that they can't work in existing servers.

glimchb commented 1 year ago

I also haven't heard from any other vendor that they can't work within the PCI spec, or that they can't work in existing servers.

I got feedback from one DPU vendor that strongly advocates for Option3 with OOB management via BMC in order not to rely on PCIe timing. Even if they are PCIe compliant card.

dandaly commented 1 year ago

I got feedback from one DPU vendor that strongly advocates for Option3 with OOB management via BMC in order not to rely on PCIe timing. Even if they are PCIe compliant card.

Can we get this rationale into the document? We need to have a clear reason why we need to create a dependency like what Option 3 proposes, since that dependency will limit adoption.