openucx / ucx

Unified Communication X (mailing list - https://elist.ornl.gov/mailman/listinfo/ucx-group)
http://www.openucx.org
Other
1.12k stars 418 forks source link

UCX and DPDK are mutually incompatible #2478

Open raphaelcohn opened 6 years ago

raphaelcohn commented 6 years ago

In order to haveHAVE_IB_EXT_ATOMICS defined, OpenUCX needs to be compiled with the Mellanox OFED (MLNX_OFED) variant of libibverbs. However, this version of libibverbs isn't compatible with DPDK, which uses the rdma-core version. One can use the rdma-core version with ucx, but then extended atomics aren't available.

My use case for UCX requires an app to use both DPDK and UCX...

shamisp commented 6 years ago

Technically, you should be able to install MOFED in one location and rdma-core (user space) in another location. I would expect that rdma-core user space is compatible with MOFED kernel space.

Then you can compile UCX against MOFED verbs and DKDK against rdma-core verbs.

Generally speaking I'm a bit surprised that DKDK does not compile against MOFED verbs since it has to be backward compatible with rdma-core verbs.

@yosefe any comments ?

shahafsh commented 6 years ago

Pasha is correct.

one can work with legacy verbs user space for UCX and rdma-core for DPDK on top of the same underlying kernel (as long it is OFED 4.2 and above, before there are some kernel bugs).

rdma-core and legacy verbs (ibvexp..) are not backward compatible.
currently on ofed installation once can choose which one of them to install.

shamisp commented 6 years ago

@shahafsh - Maybe MLNX can document installation step for a setup where rdma-core verbs and extended verbs co-exist together

shamisp commented 6 years ago

@raphaelcohn does it address your request ? Can we close this issue ?

raphaelcohn commented 6 years ago

@shamisp @shahafsh Not really. It's not possible to statically link UCX and DPDK into one application and retain extended atomics (amongst other things). The correct solution is to fix UCX to be fully compatible, not partly compatible, with the version of libibverbs (and mlx5) present in the github version of rdma-core.

raphaelcohn commented 6 years ago

(or for that matter, dynamically)

shamisp commented 6 years ago

I do not think it is UCX problem.

UCX can be compiled with rdma-core verbs or inbox distro drivers. It also can be compiled with Mellanox MOFED verbs.

You are asking for HAVE_IB_EXT_ATOMICS feature, which is for now only present in MOFED verbs, which is incompatible with rdma-core verbs. So this is rather incompatibility issue between two variants of verbs. I'm not sure how this can be resolved through UCX code base... If you have any idea, please let us know.

raphaelcohn commented 6 years ago

@shamisp I'm extremely disappointed, to say the least.

raphaelcohn commented 6 years ago

One possible route forward is to drop support for the legacy verbs ibv_exp... and implement HAVE_IB_EXT_ATOMICS on top of the non-legacy rdma-core branch. This version is what the rest of the world will be using going forward in all other products... Without this it's impossible to implement most algorithms that use atomic ops.

And whilst at it, add support for remote persistent flushes (psync)... (;-)

shamisp commented 6 years ago

Not sure if I understand the full picture here.

The "HAVE_IB_EXT_ATOMICS" API is a relatively new API that provides additional functionality, which is not present (at least at this moment) in upstream verbs. Hopefully it will be there one day...

Anyways HAVE_IB_EXT_ATOMICS is used for masked atomics that are used for implementation of swap and 32bit AMOs. 64bit AMOs (FADD, ADD, CSWAP) we can do with regular verbs without the extension. Potentially we can emulate all the missing AMOs in software, but it will be a bit slower compared to native hardware implementation.

I'm not sure what is persistent flush so...

yosefe commented 6 years ago

the legacy _exp APIs are in the process of being ported to rdma-core also, the plan is to port UCX to use the upstream rdma-core APIs instead of _exp during the the next 6-12 months

raphaelcohn commented 6 years ago

@yosefe That's really good news.

@shamisp Persistent flush is a kind of memory barrier one needs to implement most persistent, non-blocking data structures. I probably shouldn't have mentioned it as it isn't 100% relevant to this issue, but it is going to become critical for implementing remotely-persistable data structures using the newer kinds of memory now arriving. A bit more background is in https://concurrencyfreaks.blogspot.co.uk/2018/01/a-lock-free-persistent-queue.html and this paper Brief Announcement: Preserving Happens-before in Persistent Memory.

raphaelcohn commented 6 years ago

Just checked one of our persistent structures, and confirmed it needs a 64-bit swap (not cas). Darn...

shamisp commented 6 years ago

@raphaelcohn I was not sure if you mean PMEM/NVMEM. It is very dynamic topic. For example, the article that you mentioned is already somewhat obsolete :) x86 deprecated PCOMMIT even before it was implemented in any uarch ! As for RDMA interconnects - there is no standardize way to implement RDMA PMEM semantics. All the existing solution are proprietary and relay on uarch specifics.

raphaelcohn commented 6 years ago

Yep, same thing. I'd be happy if there was just something that'd work on 64-bit x86...