microsoft / Freeflow

High performance container overlay networks on Linux. Enabling RDMA (on both InfiniBand and RoCE) and accelerating TCP to bare metal performance. Freeflow requires zero modification on application code/binary.
MIT License
596 stars 89 forks source link

Why we tried ib_read_bw and ib_write_bw testings without FFO installed but succeeded? And why we installed libibverbs but can't find drivers? #10

Open ling0329 opened 5 years ago

ling0329 commented 5 years ago

In Section 4.3 where one-sided operations are discussed, we see there are two problems to support one-sided operations, and the first is the local FFR does not know the corresponding s-mem on the other side. To solve this problem, FreeFlow builds a central key-value store in FFO for all FFRs to learn the mapping between mem’s pointer in application’s virtual memory space and the corresponding s-mem’s pointer in FFR’s virtual memory space. However, our testings of ib_read_bw and ib_write_bw all succeeded without FFO installed, though we don't know how to install FFO.
It should be noted that all of our ib_send/read/write_bw testings are based on rdma_cm mode, because if we install libibverbs, we will encounter a warning of 'no userspace device-specific driver found'. image So we only install libmlx4 and librdmacm, and all testings are based on standard libibvers of rdma. Then if we test based on non rdma_cm mode, it will not go through router. Did you met this problem before? We tried to solve this problem, and found that the function try_driver in init.c fails to find dirvers when executing image Then we think it is caused by driver initialization, and locate to function mlx4_driver_init defined in mlx4.c in libmlx4. We also found in file mlx4.c, you cut many lines, that make us confused. The problem we finally located to is in the following code, it doesn't 'goto found', so 'return NULL' early. image But why? Why rdma_cm mode doesn't met this problem? But with libibvers installed, both modes are influenced? Wish your answer!

bobzhuyb commented 5 years ago

Did you install Mellanox OFED driver outside the container, and mount the user space driver path into the container, like -v /sys/class/:/sys/class/ ? You can find this in the README.md command line. Do you have /sys/class/infiniband_verbs/uverbs0 ?

ling0329 commented 5 years ago

Yes, we have installed Mellanox OFED driver both outside and inside the container. And we can make sure we have mounted the user space driver path into the container, because we used the command you provided to start the application container, without any modification. We also have image

ling0329 commented 5 years ago

We have found out why it fails to find devices. Because the abi_version of Mellanox NICs we used is 1, not within 3 to 4, so it needs to match libmlx5, not libmlx4. However, we must use high version Mellanox NICs. Anyway, thank you for your attentions.

ling0329 commented 5 years ago

Sorry to bother you again. But we still want to know why we tried ib_read_bw and ib_write_bw testings without FFO installed but succeeded under rdma_cm mode? According to the analyses what have been discussed in your paper, we can see FFO is indispensable when executing one-sided operations, but how is it reflected in the open source environment. Here is our test case of ib_read_bw. In the server side, image In the client side, image And we got the output image The above testing was executed between two containers from different hosts, and all succeeded. Maybe our testing method was wrong, but it really went through FFR. Look forward to your reply. Thanks.

nilyibo commented 5 years ago

@ling0329 I think in this implementation, they hardcoded the one-sided mapping information in code. From README:

the released implementation hard-codes the host IPs and virtual IP to host IP mapping in https://github.com/Microsoft/Freeflow/blob/master/ffrouter/ffrouter.cpp#L215 and https://github.com/Microsoft/Freeflow/blob/master/ffrouter/ffrouter.h#L76.

Also, it looks like you are also trying to run Freeflow with newer NICs. Do you get it to work successfully? And can you share what version of Ubuntu and OFED you are running? (both container and host OS)

ling0329 commented 5 years ago

We are still trying to solve this problem but failed. Actually, we are not ready to modify libmlx5, because there are much differences between libmlx4 and libmlx5. This is our Ubuntu version on host OS image The Ubuntu version of container is the same image Our OFED is MLNX_OFED_LINUX-4.0-2.0.0.1-ubuntu14.04-x86_64. We use ConnectX-4 40G NICs, and you can see details here image MT27700 Family is not listed in libmlx4 image and we want to try more newer NICs, like ConnectX-5 25G.

nilyibo commented 5 years ago

I see. Thanks for sharing your setup. Yeah, porting these changes to libmlx5 is probably gonna take a lot of effort. It seems that FreeFlow only works with ConnectX-3. I saw your workaround for the hca_table check and used that to get rdma_client/rdma_server working, albeit it still hangs 20% of the time.

bobzhuyb commented 5 years ago

The current architecture of Freeflow works only with libmlx4. It's possible to use the LD_PRELOAD trick to re-implement a cross-driver-version solution by intercepting relevant calls. However, it requires quite a bit efforts, and all the authors of this project are now busy with something else...