need to split RQ / WQ handling

rfaucett commented 9 years ago

libfabric is moving towards much more independent handling of RQs and WQs, instead of having the atomic unit be a QP. Due to its verbs roots and the ib_core driver subsystem, our driver still deals mostly in QPs. There are a couple of ways we can deal with this:

1) Leave the create_qp interface alone, and let the driver hand queues to the library in RQ/WR pairs. The lib can then manage the individual queues itself, since the host control register configuration is done by the lib anyhow, including CQ assignments.

2) Extend the create_qp driver interface to be able to request only an RQ or a WQ.

3) Since we are now treating the elements in a VF as all-or-none, we could simply go ahead and map everything into the lib at device open time (call create_qp N times). This would require having create_cq actually allocate the resource and let the lib handle binding RQ/WQ to it.

Other thoughts on this?

xuywang commented 9 years ago

I think 2) probably is a more clean way to do it.

For 1), how do you envision when create_qp is called when there might be multiple occasions that RQ/WQ resource needed to be allocated? a) Create a normal QP with 1 RQ/WQ b) Create a stx_ctx with 1 WQ c) Create a rtx_ctx with 1 RQ d) Create a QP that bounds to a previous stx_ctx e) Create a QP that bounds to a previous rtx_ctx

As the create_qp API always allocate 1 pair of RQ/WQ resource, it probably requires some careful logic to call this API at all these cases to avoid wasting resources. I suspect the lib needs a view of resources to make the right choices, but that looks like the job kernel module is already doing.

For 3), it also requires moving maintaining resource allocation logic to lib. But it probably easily to do than 1). One drawback is that, usnic_status does not report the correct information of QP resource usage.

For 2), I think it's easy to pass information through create_qp parameter for all these different resource allocation scenario. Verbs core has an API to create SRQ now, but not one for STQ. Using their API can avoid to allocate a QP object when only srx_ctx is created.

rfaucett commented 9 years ago

For both 1 & 3, yes, I was envisioning the lib doing complete management of queue resources. All of 1a-1e are handled by the lib maintaining pool of free WQs and free RQs, and if either an RQ or a WQ is needed, you end up getting one of each, but there's no problem with that, just means both free pools go up by one.

re: usnic_status reporting - good point - the information presented in /sys is going to need to be re-thought anyhow once QPs are no longer atomic entities.

Let's pretend for a moment we don't have to deal with ib_core - what are the atomic operations we really want? My first pass is:

attach to VF (domain)
alloc cq
alloc wq
alloc rq
bind filter to rq
matching de-alloc calls

And, really, the only reason for the individual alloc/dealloc calls is to allow for visibility thru /sys, yes? Do we really need any more information in /sys beyond RQ filter bindings?

rfaucett commented 9 years ago

I've started the libfabric side of this - both EP_MSG and EP_RDM use RX/TX contexts to hold their queues. Right now I wastefully create an entire QP for each instead of just a WQ for TX and RQ for RX.

xuywang commented 9 years ago

I will now start to look at how to implement this from kernel module.

rfaucett / libfabric

need to split RQ / WQ handling #11