Closed andymalakov closed 8 years ago
Hi.. retry counters we can expose as part of the connect()/accept() calls, I should implement that anyway..
Any other specific parameters you have in mind?
Hi Patrick, thanks for quick reply.
Specifying these during connect/accept may be fine. Other parameters are MTU and QP state.
We are working on Java implementation of DARE algorithm (http://tiny.cc/o34pgy). This algorithm uses QP state to manage log access between leader/followers. Just saying that control over QP state might be a useful feature as well.
I may try to help by implementing&testing this feature in a fork, but need some guidance. What would be a consistent way of exposing this in DiSni API and what is the proper way to expose 'ibv_qp_attr' struct?
Hi Andy,
Great, i'm familiar with the DARE paper, so looking forward to your implementation..
Here is a proposal what we could do: let me first make sure all the parameters for in "struct rdma_conn_param" are exposed in both the raw DiSNI verbs API and the endpoint API for both connect() and accept(). Second, I can draft a first implemention of ibv_modify_qp that will allow you to modify certain parameters in the QP (e.g. timeout). You could then extend this draft according to your needs or I can help as well..
That being said, DiSNI is implementing the new RDMA connection management API (https://linux.die.net/man/7/rdma_cm) where the QP state during the connection setup is orchestrated by the communication manager. A manual change of the QP state (Init, RTR, RTS, etc.) as in the old infiniband stack is not supported.
Let me know if that works for you...
It seems like RDMA_CM only allows single QP per connection (RdmaCmId)? Looks like a serious limitation.
Yes, this is how the verbs RDMA API is defined. In 'Reliable Connected' mode, a connection is represented by a single QP endpoint. But this is not RDMA_CM specific, it applies also to IB_VERBS style connection management. What connection semantics are you looking for? Multiple QP's per connection endpoint? Or a QP which can deliver data to different peer QP's?
At minimum I am looking of an ability to control QP attributes described above and QP state.
But it would be nice to support the first method of connection described here: [http://www.rdmamojo.com/2014/01/18/connecting-queue-pairs/]
Q: What connection semantics are you looking for? A: To implement DARE algorithm we need multiple QPs between each host in a cluster. Current DiSNI design would require dedicated end point for each QP. That would require each host to open a range of ports (one per QP) and duplicated connection handling logic.
I just wanted to point out that RDMA_CM supports UD QPs (cf. https://linux.die.net/man/7/rdma_cm) however I'm not sure if the necessary functionality is exposed by DiSNI.
Why would you need a range of ports? Just open a server endpoint on a single port, then accept as many client endpoints you like. There is going to be a RdmaEndpoint object (with a dedicated QP) at the server and at the client per connection, but only a single listening port. For a "real" use case of DiSNI at scale have a look at https://github.com/zrlio/crail/tree/master/storage-rdma.
Patrick, I see your point. It is just single port would require some kind of initial handshake to let acceptor side know what kind of QP each particular client connection is trying to establish.
Let's us get back to the original request - how to control queue attributes and queue state.
Currently we only support one type of QP that is IbvQP.IBV_QPT_RC.
As for ibv_modify_qp, the set of parameters that cannot already be set at connection establishment time, and also do not conflict with the CM way of establishing the connection, are limited. I can see those parameters if at all: timeout, qp_access_flags, max_dest_rd_atomic
If the ability to change any of these fields solves the issue for you then let me add the method.. otherwise let's have a phone conversion to discuss your exact requirements..
Patrick, I understand your point about sticking to CM approach to manage QueuePair state.
Please extend QP creation time parameters (mtu, retry counts, and timeouts).
Ok, should have something hopefully by the end of the week...
retry counters set in RdmaConnParam are now respected in connect() and accept(). For the endpoint API, default values can be changed on the group.
Thanks!!
Could you please expose method ibv_modify_qp in API?
It provides control over queue state and allows to define important parameters like retry counters.