ofiwg / libfabric

Open Fabric Interfaces
http://libfabric.org/
Other
546 stars 375 forks source link

prov/bgq: Hardware accelerate RMA operations #2649

Closed pkcoff closed 7 years ago

pkcoff commented 7 years ago

The current plan for RMA hw acceleration is as follows:

1.) Switch from FI_MR_SCALABLE to FI_MR_BASIC and use the suggestion from Sean in issue #2548:

I think you could make use of Mr. Basic if you used: key = base virtual address - base physical address offset = virtual address passed into read/write call - key The key is set when registering, and the offset is calculated at the peer. The overhead is trivial. It's possible to convert the enum into an int and use it as flags. So that is an option if we want to expose apps to more possible memory registration options.

2.) Implement DPut hw acceleration.

3.) Implement RGet hw acceleration for small-scale targetting the rget injection fifo directly.

4.) Implement RGet hw acceleration for large-scale targetting the rget injection fifo indirectly utilizing an app-agent running on the 17th core.

5.) Combine line items 3 and 4 into an implementation a hybrid RGet acceleration algorithm where a target-rank within a given distance threshold will have the rget injection fifo targeted directly, and beyond that threshold utilize the app-agent. The threshold will be implemented via a bit in the address vector for a given target address whether is it within the distance threshold or not. In this manner can maximize bandwidth while avoid rget injection fifo overflow at scale.

pkcoff commented 7 years ago

Optimistic target dates / dev schedule as follows:

line items 1-3 Jan 23-Feb 10th. line item 4 have done by Feb 10th. line item 5 Feb 13-Feb17 scaling testing and debug Feb 20 - March 3

mblockso commented 7 years ago

FYI .. mr_mode is being converted into mr bits (#2659). This should allow bgq to hardware accelerate rma operations. Need to integrate this change with mpich.

pkcoff commented 7 years ago

line item 1 coded and working, Sean's idea for the key seems to be working in that I can obtain the physical address using the virtual-key, line item 2 almost working, but also need this line item now:

2.5) Add physical address version for emulated models.

Since not all operations will be hw accelerated (basically all atomic ops will still be emulated) and we can't have both the mr_scalable offset AND physical address key offset will need to add atomic emulated models that use a physical address instead of emulated bat offset when we are in mr_basic mode.

pkcoff commented 7 years ago

Line item 2 is working, line item 2.5 has been implemented and working, line items 3-5 abandoned as the RGet is adequately performance using sw emulation as the pami hw accelerated code. Done.