Closed subes closed 1 year ago
also try again to run RMA put/get tests in jucx
https://www.reflectionsofthevoid.com/2020/07/software-rdma-revisited-setting-up.html
Reflections Of The Void Software RDMA revisited setting up SoftiWARP on Ubuntu 20.04.pdf
Requires the use of an actually connected network interface:
modprobe siw
ifconfig #to find a connected ethernet or wifi module, "lo" did not work
sudo rdma link add siw0 type siw netdev wlp112s0
rdma link #should list the new device
ifconfig #to find the ip address of wlp112s0
rping -s -a 192.168.0.20 -v #server
rping -c -a 192.168.0.20 -v #client
sudo rdma link delete siw0 #call this during a test to verify that the interface is used, test should crash
ucx does not support iWarp as it seems: https://github.com/openucx/ucx/issues/2507
They have some commits for it but say it is untested since 2017? At least I can not get it to work with SoftiWarp.
Also seems as if the code does not support iWarp because it checks for only Infiniband? https://github.com/openucx/ucx/blob/eadd74f9fe5b0edc081ba1ce589fb850d6809934/src/uct/ib/base/ib_md.c
Alternative is rdma_rxe (similar to UDP, though seems to keep packet order?): https://enterprise-support.nvidia.com/s/article/howto-configure-soft-roce (though outdated https://github.com/linux-rdma/rdma-core/commit/0d2ff0e1502ebc63346bc9ffd37deb3c4fd0dbc9)
modprobe rdma_rxe
ifconfig #to find a connected ethernet or wifi module, "lo" did not work
sudo rdma link add rxe0 type rxe netdev wlp112s0
rdma link #should list the new device
ifconfig #to find the ip address of wlp112s0
rping -s -a 192.168.0.20 -v #server
rping -c -a 192.168.0.20 -v #client
sudo rdma link delete rxe0 #call this during a test to verify that the interface is used, test should crash
Hadronio works with Soft-RoCe, our Jucx integration requires that the listener does not get closed (which is now the default).
This here suggests Soft-RoCe can improve performance of normal networks cards as well: https://www.reflectionsofthevoid.com/2011/08/soft-roce-alternative-to-soft-iwarp.html https://www.lanl.gov/projects/national-security-education-center/information-science-technology/_assets/docs/2010-si-docs/Team_CYAN_Implementation_and_Comparison_of_RDMA_Over_Ethernet_Presentation.pdf
Soft-RoCe Checklist:
RoCe hangs might be due to unreliability of the protocol: https://github.com/zrlio/disni/issues/37#issuecomment-458055469
SoftiWarp Checklist:
finished
https://github.com/zrlio/softiwarp
software based infiniband (similar to TCP/SCTP)