ofiwg / libfabric

Open Fabric Interfaces
http://libfabric.org/
Other
527 stars 369 forks source link

prov/efa: Create unit tests to validate efa provider meet the LL128 protocol requirement #10146

Open jiaxiyan opened 5 days ago

jiaxiyan commented 5 days ago

Add the following unit tests:

  1. Test the packet allocated from read_copy_pkt_pool is 128 byte aligned.
  2. Test when using in order aligned send/recv, the copy method is always rdma read.
  3. The data sent by runting read protocol should always have a size as 128 multiple. NCCL will always send a 128 multiple size for LL128 protocol, but runting read will only send a segment of the whole message via ibv_send. The test makes sure such segmented size must be 128 multiple.
jiaxiyan commented 4 days ago

CI failure is real. I will post the fix next week.