Open eedalong opened 2 years ago
max_rd_atomic
is a crucial QP attribute for performance, it is the number of RDMA Reads & atomic operations outstanding at any time that can be handled by a RC QP as an initiator. Well, for me, I still cannot understand why setting this attribute larger than 1 helps us a lot. We need to find out the reasons behind.
RDMA Scatter/Gather is a nice way to consolidate data transfers. For example, verbs API allows data at multiple locations to be written in a remote buffer with a SINGLE RDMA write operation; or, data in a remote buffer could be read to multiple locations with a SINGLE RDMA read operation. This is attractive, but seems that nobody has reported benefits they get from this feature.
One possible reason is the limited RNIC SRAM, which may cause these 2 problems:
Remote and local HCA cannot store much data, which limit the SGE_NUM. But recent RNIC has much larger SRAM compared with earlier generations, Mellanox CX4 RNIC has about 2MB SRAM, which is big enough for us because node feature in GNN is only about 1-4KB.
Storing too much data in HCA may lead to little memory budget for MTT/MPT which may cause severe MTT shoot down and too many PCIe/DMA overheads for address translation. But this can be solved if we use large physical contiguous memory (using Linux CMA ) and physical memory region which is a new feature in Mellanox CX5.