Closed shijin-aws closed 1 year ago
Had offline discussion with @aingerson . We agree that it's not a issue in peer API but something with the util SRX framework. The current idea to solve this issue is to introduce a update_mr(srx, rx_entry)
function pointer in util_srx_ctx
so owner could update the rx entry passed to peer.
I will move the discussion to https://github.com/ofiwg/libfabric/pull/8907 and close this issue.
Working on making efa onboard the util SRX framework https://github.com/ofiwg/libfabric/pull/8907, one blocker I currently have is how do we convert MR desc between the owner and peer providers. Efa has an internal
struct efa_mr
https://github.com/ofiwg/libfabric/blob/main/prov/efa/src/efa_mr.h#L53-L63, which hasshm_mr
as a member. When application calls fi_mr_regattr to efa, efa will call fi_mr_regattr to shm, and efa was able to retrieve the shm mr from the desc passed by application viafi_mr_desc((struct efa_mr *)desc->shm_mr)
.One challenge I have right now is, if I simply call
util_srx_generic_recv
in efa'sfi_recv
, there is no good way I can do such mr desc translation before it calls thestart_msg
. Because application may call fi_recv with FI_ADDR_UNSPEC, so efa does not know whether the incoming message would be from intra-node or inter-node so it cannot do any translation for mr desc before callutil_srx_generic_recv
either.Then when
util_srx_generic_recv
found a matched rx entry from the unexpected queue, it will update the rx entry with the desc only readable by owner, while the start_msg could be the one for peer which cannot understand this desc. Before usingutil_srx_generic_recv
, efa uses its own generic_recv has an extra step to update the desc in the rx_entry before calling the start_msg.To make such MR desc update a general procedure for all provider and can be used by util SRX, I am thinking whether we could introduce a new function in the peer SRX's peer_ops like (use non-tagged as an example)
This function is called by owner before calling
start_msg
in the generic_recv. To fix the MR desc translation issue I mentioned above, efa could tell whether this rx_entry is queued by owner and peer by reading therx_entry->srx
and compare it with thepeer_srx
argument in the call, which should be owner's peer_srx in this case, and update the desc in therx_entry
accordingly.How to implement
update_msg
is provider specific.Another solution could be introducing a peer MR API, similar to peer AV and peer CQ. But I am not sure how it will look like right now.
It will be appreciated if I could get your feedback on this @aingerson @shefty